diff --git "a/context_encoding_model/_tp0_bk3/log-neuron-cc.txt" "b/context_encoding_model/_tp0_bk3/log-neuron-cc.txt" new file mode 100644--- /dev/null +++ "b/context_encoding_model/_tp0_bk3/log-neuron-cc.txt" @@ -0,0 +1,5272 @@ +2025-08-07T13:53:51Z INFO 47918 [root]: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile --framework=XLA /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.hlo_module.pb --output /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.neff --target=trn1 --auto-cast=none --model-type=transformer '--tensorizer-options=--enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma' --lnc=1 -O1 '--internal-hlo2tensorizer-options= --modular-flow-mac-threshold=10 --verify-hlo=true' --logfile=/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/log-neuron-cc.txt --verbose=35 +2025-08-07T13:53:51Z INFO 47918 [root]: NeuronX Compiler version 2.20.9961.0+0acef03a Python version 3.10.12 HWM version 2.20.0.9961+0acef03a NumPy version 1.26.4 Running on AMI ami-040348201d80b58ad Running in region usw2-az4 +2025-08-07T13:53:51Z INFO 48502 [root]: XLA detected +2025-08-07T13:53:51Z INFO 48502 [root]: Pipeline: HLOToTensorizer Frontend StaticIOTranspose WalrusDriver BIRLinker Kelper NeffWrapper +2025-08-07T13:53:51Z INFO 48502 [root]: Intermediate files stored in /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6, output in /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3 +2025-08-07T13:53:51Z INFO 48502 [pipeline.Pipeline.0]: Job Pipeline len(in_states) 1 +2025-08-07T13:53:51Z INFO 48502 [pipeline.Pipeline.0]: Processing input #0 +2025-08-07T13:53:51Z INFO 48502 [pipeline.Pipeline.0]: Running pipeline Pipeline.0 +2025-08-07T13:53:51Z INFO 48502 [pipeline.Pipeline.0]: Starting job job.HLOToTensorizer.0 +2025-08-07T13:53:51Z INFO 48502 [job.HLOToTensorizer.0]: Job HLOToTensorizer len(in_states) 1 +2025-08-07T13:53:51Z INFO 48502 [job.HLOToTensorizer.0]: Processing input #0 +2025-08-07T13:53:51Z INFO 48502 [job.HLOToTensorizer.0]: IR signature: 9068f3ba4f55e1b8b35adde74efc6a9e617baa344783aaee62353f9181c3092c for model.MODULE_b3ddbc97e5f0d1d64c82+155de413.hlo_module.pb +2025-08-07T13:53:51Z INFO 48502 [job.HLOToTensorizer.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo2penguin --input /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.hlo_module.pb --out-dir ./ --output penguin.py --remat --max-costly-ops=2 --max-live-in-size=5 --max-remat-chain-size=10 --max-mem-multiple=1.8 --min-def-use-distance=500 --remat-policy=transformer --allow-same-pass-remat=true --layers-per-module=1 --partition --emit-tensor-level-dropout-ops --modular-flow-mac-threshold=10 --verify-hlo=true --native-to-custom-softmax --partitioner-opts='--transformer' +2025-08-07T13:53:52Z INFO 48502 [job.HLOToTensorizer.0]: DEBUG: needsModular_PreSplit? Yes. macCnt 3711162974208 threshold 4398046511104 num non-trivial Ops 3871 +INFO: Number of Native SoftmaxDx's detected and replaced: 0 +INFO: Number of Native Softmax's detected and replaced: 38 + +Pre-Partition Pre-Opt Histogram: +total HLO instructions: 10617 + reshape 2091 19.69% ################################################################ + broadcast 1731 16.30% #################################################### + convert 1281 12.07% ####################################### + transpose 1268 11.94% ###################################### + constant 815 7.68% ######################## + parameter 475 4.47% ############## + slice 445 4.19% ############# + add 365 3.44% ########### + multiply 327 3.08% ########## + dot 326 3.07% ######### + get-tuple-element 295 2.78% ######### + select 255 2.40% ####### + compare 222 2.09% ###### + call 186 1.75% ##### + concatenate 148 1.39% #### + tuple 73 0.69% ## + scatter 73 0.69% ## + negate 72 0.68% ## + all-reduce 72 0.68% ## + custom-call 38 0.36% # + divide 37 0.35% # + iota 7 0.07% + gather 6 0.06% + all-gather 3 0.03% + reduce 3 0.03% + sine 1 0.01% + cosine 1 0.01% + maximum 1 0.01% + +INFO: IoStatistics: total inputs: 475 +INFO: IoStatistics: total outputs: 73 +INFO: IoStatistics: total passthrough tensors: 0 +INFO: IoStatistics: total outputs read from: 0 +INFO: IoStatistics: total redundant outputs: 0 +INFO: IoStatistics: total ifmap size (KiB): 8072802 +INFO: IoStatistics: total ofmap size (KiB): 73728 +INFO: IoStatistics: total must-alias size (KiB): 73728 +INFO: IoStatistics: total may-alias size (KiB): 0 +INFO: HloMacCount has found 3711162908672 +INFO: Traffic has found 8885483693 +INFO: AIF 835.33 + +Pre-Partition Post-Op Histogram: +total HLO instructions: 6623 + reshape 1424 21.50% ################################################################ + convert 992 14.98% ############################################ + transpose 941 14.21% ########################################## + constant 523 7.90% ####################### + parameter 475 7.17% ##################### + broadcast 410 6.19% ################## + dot 325 4.91% ############## + custom-call 223 3.37% ########## + multiply 219 3.31% ######### + add 219 3.31% ######### + get-tuple-element 151 2.28% ###### + slice 147 2.22% ###### + concatenate 146 2.20% ###### + select 110 1.66% #### + compare 76 1.15% ### + scatter 73 1.10% ### + negate 72 1.09% ### + all-reduce 72 1.09% ### + gather 6 0.09% + iota 5 0.08% + all-gather 3 0.05% + reduce 3 0.05% + pad 2 0.03% + sine 1 0.02% + divide 1 0.02% + tuple 1 0.02% + maximum 1 0.02% + rng 1 0.02% + cosine 1 0.02% + +INFO: Found compute bound graph +DEBUG: needsModular_PreSplit? Yes. macCnt 3711162908672 threshold 4398046511104 num non-trivial Ops 2702 +DEBUG: transformer model +INFO: Partitioner configs:ModularFlow BO LBL SA ConcatGraphs: 1 MaxDisj:2 MaxSep:4 LPM:1 +INFO: Markers NOT detected +Potential split-points stats: #CC 75 #AR 72 #AG 3 #BN 0 nClamp 0 +DEBUG: needsModular_SplitFinder? Yes. +ModuleSplitter initial partitioning... #parts 75 +ModuleSplitter initial partitioning... Done. +INFO: Num of unique Module Definitions: 6 +DEBUG: DefMap: 0 1 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 3 2 73 74 +New disjoint wave: start 2 len 70 NumReps: 35 macs 3607772528640 +INFO: Attempting to identify and split optimizer at end +First non-zero-mac/used part from the end is 73 +Not enough zero-mac parts. skip +INFO: Optimized 0 all-reduce split instructions +INFO: Number of splitPoints: 37 +ModuleSplitter initial partitioning... #parts 37 +ModuleSplitter initial partitioning... Done. +Remat: gather-iota 0 matches, 0 ops rematted +INFO: Alias legality verification of partitions PASSED. +INFO: No transposable_weight_idx attrs found +INFO: Peak intermediate memory demand is at Partition 1. Num live intermediates at peak is 9 and memory usage is 35127300 bytes. +INFO: Please refer to LiveRangeReport_PostHloPart.txt for detailed intermediate lifetime info. +DEBUG: DefMap: 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 36 +Wrote HLO netlist to hlo_netlist.json +Wrote graph partitions in debug_info_hlo_partitions.json +Processing partition 0 +INFO: Number of Native SoftmaxDx's detected and replaced: 0 +INFO: Number of Native Softmax's detected and replaced: 0 +Replaced 0 dropout sequences with OffloadedDropout +INFO: HloMacCount has found 25769803776 +INFO: Traffic has found 705741606 +INFO: AIF 73.03 +HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert cosine custom-call dot gather get-tuple-element iota multiply negate parameter reshape scatter select sine slice transpose tuple +Invoking RemoveOptimizationBarriers pass +Processing partition 1 +INFO: Number of Native SoftmaxDx's detected and replaced: 0 +INFO: Number of Native Softmax's detected and replaced: 0 +Replaced 0 dropout sequences with OffloadedDropout +INFO: HloMacCount has found 103079215104 +INFO: Traffic has found 246989348 +INFO: AIF 834.69 +HLO Ops used in computation: add all-reduce broadcast compare concatenate constant convert custom-call dot get-tuple-element multiply negate parameter reshape scatter select slice transpose tuple +Invoking RemoveOptimizationBarriers pass +Processing partition 2 +INFO: Number of Native SoftmaxDx's detected and replaced: 0 +INFO: Number of Native Softmax's detected and replaced: 0 +Replaced 0 dropout sequences with OffloadedDropout +INFO: HloMacCount has found 77620576256 +INFO: Traffic has found 798521419 +INFO: AIF 194.41 +HLO Ops used in computation: add all-gather all-reduce broadcast compare concatenate constant convert custom-call divide dot gather get-tuple-element iota maximum multiply pad parameter reduce reshape rng scatter select slice transpose tuple +Invoking RemoveOptimizationBarriers pass + +2025-08-07T13:53:52Z INFO 48502 [job.HLOToTensorizer.0]: IR signature: 4cb5bb30df98c0f4fe837212bb465c077814190c1515012736514ef3b85e9119 for sg0000/HLOToTensorizer +2025-08-07T13:53:52Z INFO 48502 [job.HLOToTensorizer.0]: IR signature: 7d62bccc8bf6f747c9f4be1d037998542e378a1e9c073d1354821dafa6e067fe for sg0001/HLOToTensorizer +2025-08-07T13:53:52Z INFO 48502 [job.HLOToTensorizer.0]: IR signature: f703193f38eab27445c0b7b02fa8c772086cee3728a75bd67c3dcc8214cedceb for sg0002/HLOToTensorizer +2025-08-07T13:53:52Z INFO 48502 [job.HLOToTensorizer.0]: Job #0 finished +2025-08-07T13:53:52Z INFO 48502 [pipeline.Pipeline.0]: Finished job job.HLOToTensorizer.0 +2025-08-07T13:53:52Z INFO 48502 [pipeline.Pipeline.0]: Starting job job.Frontend.0 +2025-08-07T13:53:52Z INFO 48502 [job.Frontend.0]: Job Frontend len(in_states) 1 +2025-08-07T13:53:52Z INFO 48502 [job.Frontend.0]: Processing input #0 +2025-08-07T13:53:52Z INFO 48502 [job.Frontend.0]: Start model loading +2025-08-07T13:53:52Z INFO 48502 [job.Frontend.0]: Start tensorization +2025-08-07T13:53:52Z INFO 48502 [job.Frontend.0]: Num jobs: 128 +2025-08-07T13:53:52Z USER 48502 [root/Tensorizer/Tensorizer]: Running Tensorizer +2025-08-07T13:53:52Z INFO 48502 [Tensorizer]: Max workers: 3 +2025-08-07T13:53:52Z INFO 49124 [Tensorizer]: Building model from Penguin script "penguin.py.000000"... +2025-08-07T13:53:52Z INFO 49125 [Tensorizer]: Building model from Penguin script "penguin.py.000001"... +2025-08-07T13:53:52Z INFO 49126 [Tensorizer]: Building model from Penguin script "penguin.py.000002"... +2025-08-07T13:53:52Z INFO 49125 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-08-07T13:53:52Z INFO 49124 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49126 [Tensorizer]: Tensorizer options: --enable-ccop-compute-overlap --cc-pipeline-tiling-factor=2 --vectorize-strided-dma --run-pg-layout-and-tiling --enable-dse-after-mask-propagation --disable-concat-delinearizer --num-neuroncores-per-sengine=1 --num-neuroncores-per-sengine=1 --internal_dynamic_dma_scratch_size_per_partition=16384 --disable-bitcasted-transpose --dont-verify-after-all --fp32-cast=none --mm-transpose-type=fp32 --disable-expensive-checks --disable-max-stride-tiling --hbm-scratchpad-page-size-in-bytes=536870912 --enable-replication --max-local-tensor-tile-size-in-bytes=32768 --tensor-layout-p-order=0 --tensor-layout-b-order=1 --enable-advanced-delinearization --weight-coalescing-threshold=512 --enable-bir-converter=enable --enable-tritium-loopfusion --enable-softmax-kernel --model-type-transformer --enable-isl-in-injective-check --enable-dge-on-io-dma --enable-dge-on-indirect-dma --enable-dge-on-vector-indirect-dma --keep-rng-tensor-op +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.002 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.001 seconds +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.001 seconds +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-08-07T13:53:52Z INFO 49125 [sg0001/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Running LegalizeOpLevelAlias +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LegalizeOpLevelAlias]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.003 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.003 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LegalizeOpLevelAlias]: LegalizeOpLevelAlias finished after 0.002 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Running OptimizeAliasedCopyChain +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/OptimizeAliasedCopyChain]: OptimizeAliasedCopyChain finished after 0.001 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.002 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/TransformConvOp]: Running TransformConvOp +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/TransformConvOp]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.013 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.007 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.029 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.003 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.007 seconds +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-08-07T13:53:52Z INFO 49124 [sg0000/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.011 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.005 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.028 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.005 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.002 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LegalizeCCOpLayout]: Finished (changed=True) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.003 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.001 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.001 seconds +2025-08-07T13:53:52Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.013 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LegalizeCCOpLayout]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.019 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LateLowerTensorOp]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.005 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.028 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TransformConvOp]: TransformConvOp finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LowerTensorOp]: Running LowerTensorOp +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LowerTensorOp]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.028 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LowerTensorOp]: LowerTensorOp finished after 0.012 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.005 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.006 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.082 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TensorOpSimplifier]: Running TensorOpSimplifier +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TensorOpSimplifier]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.018 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TensorOpSimplifier]: TensorOpSimplifier finished after 0.006 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/CanonicalizeIR]: Running CanonicalizeIR +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/CanonicalizeIR]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.004 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/CanonicalizeIR]: CanonicalizeIR finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LegalizeCCOpLayout]: Running LegalizeCCOpLayout +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LegalizeCCOpLayout]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.015 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Rematerialization]: Running Rematerialization +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Rematerialization]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Rematerialization]: Rematerialization finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.028 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LateLowerTensorOp]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LegalizeCCOpLayout]: LegalizeCCOpLayout finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/ResolveComplicatePredicates]: Running ResolveComplicatePredicates +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/ResolveComplicatePredicates]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/ResolveComplicatePredicates]: ResolveComplicatePredicates finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AffinePredicateResolution]: Running AffinePredicateResolution +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AffinePredicateResolution]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AffinePredicateResolution]: AffinePredicateResolution finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.005 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.001 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.005 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.015 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.009 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.033 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.010 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.010 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/ExpandBatchNorm]: Running ExpandBatchNorm +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/ExpandBatchNorm]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/ExpandBatchNorm]: ExpandBatchNorm finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.109 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/EliminateDivs]: Running EliminateDivs +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/EliminateDivs]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/EliminateDivs]: EliminateDivs finished after 0.005 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.029 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Rematerialization]: Running Rematerialization +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Rematerialization]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TensorOpTransform]: Running TensorOpTransform +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Rematerialization]: Rematerialization finished after 0.005 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TensorOpTransform]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.013 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/TensorOpTransform]: TensorOpTransform finished after 0.033 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LateLowerTensorOp]: Running LateLowerTensorOp +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LateLowerTensorOp]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/LateLowerTensorOp]: LateLowerTensorOp finished after 0.005 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyReset]: Running AliasDependencyReset +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: Running AliasDependencyInduction +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyInduction]: AliasDependencyInduction finished after 0.008 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/AliasDependencyReset]: AliasDependencyReset finished after 0.029 seconds +2025-08-07T13:53:53Z INFO 49125 [sg0001/Tensorizer/MemcpyElimination]: Running MemcpyElimination +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.019 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.003 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:53Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Finished (changed=True) +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.008 seconds +2025-08-07T13:53:53Z INFO 49124 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.008 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/MemcpyElimination]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.038 seconds +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/MemcpyElimination]: MemcpyElimination finished after 0.104 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.004 seconds +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.029 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Rematerialization]: Running Rematerialization +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Rematerialization]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Rematerialization]: Rematerialization finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.009 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.012 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.007 seconds +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.001 seconds +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49124 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.041 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.007 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/SimplifySlice]: Running SimplifySlice +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/SimplifySlice]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.006 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/PadElimination]: Running PadElimination +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/PadElimination]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/PadElimination]: PadElimination finished after 0.000 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LoopFusion]: LoopFusion finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/SimplifySlice]: SimplifySlice finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.004 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/PadElimination]: Running PadElimination +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/PadElimination]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/PadElimination]: PadElimination finished after 0.000 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TCTransform]: TCTransform finished after 0.001 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.004 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.006 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.007 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Recompute]: Running Recompute +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Recompute]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Recompute]: Recompute finished after 0.000 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [Tensorizer]: After optimization: 38 statements +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/MutateDataType]: Running MutateDataType +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/MutateDataType]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/MutateDataType]: MutateDataType finished after 0.001 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Simplifier]: Simplifier finished after 0.004 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TileCCOps]: Running TileCCOps +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `All gather output tensor check failed` +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TileCCOps]: in float32 (512,) %'all_gather.2' = AllGatherOp-149 AllGather_add(float32 (256,) %'add.11', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.8843 | hlo_id: 101 | , id = 149 +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TileCCOps]: pass did not tile CC tensor due to `multi_rank_size=2048 is not above min_allgather_tile_size_in_bytes=8388608` +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TileCCOps]: in uint32 (512,) %'all_gather.3' = AllGatherOp-165 AllGather_add(uint32 (256,) %'add.12', replica_groups = [[0, 1]],all_gather_dim = DimensionSet((512,), {0}),stream_id = -1) # dl = tensor_op_name: _all-gather.8978 | hlo_id: 110 | , id = 165 +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TileCCOps]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/TileCCOps]: TileCCOps finished after 0.006 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LoopFusion]: LoopFusion finished after 0.007 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:54Z INFO 49125 [sg0001/Tensorizer/TCTransform]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.007 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.003 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.001 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.009 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.014 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.027 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.005 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.056 seconds +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-08-07T13:53:54Z INFO 49126 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.015 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/PadElimination]: Running PadElimination +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/PadElimination]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/PadElimination]: PadElimination finished after 0.000 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.052 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: Running LoopFusion +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.089 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LoopFusion]: LoopFusion finished after 0.006 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PGTiling]: Running PGTiling +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 585 of IO tensor {'CrossPassTensor': ''}bfloat16 %input471|NC|(128, 32) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 586 of IO tensor {'CrossPassTensor': ''}bfloat16 %input472|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 587 of IO tensor {'CrossPassTensor': ''}bfloat16 %input470|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 588 of IO tensor {'CrossPassTensor': ''}bfloat16 %input469(32, 2, 128, 24, 128) is not sorted, index list (w/ AG ids): [(10, 'AG54'), (15, 'AG52'), (11, 'AG53')] +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 589 of IO tensor {'CrossPassTensor': ''}bfloat16 %input474|NC|(128, 32) is not sorted, index list (w/ AG ids): [(16, 'AG49'), (13, 'AG50')] +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 540 of IO tensor {'CrossPassTensor': ''}bfloat16 %input473|NC|(75968, 32, 128) is not sorted, index list (w/ AG ids): [(14, 'AG59'), (13, 'AG50')] +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.018 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.005 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.005 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.000 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.001 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.034 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PGTiling]: PGTiling finished after 0.165 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.035 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Recompute]: Running Recompute +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Recompute]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Recompute]: Recompute finished after 0.000 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49125 [Tensorizer]: After optimization: 25 statements +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/MutateDataType]: Running MutateDataType +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/MutateDataType]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/MutateDataType]: MutateDataType finished after 0.001 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/TileCCOps]: Running TileCCOps +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/TileCCOps]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.020 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/TileCCOps]: TileCCOps finished after 0.006 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.013 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.020 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 0.625 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 19008: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 19008: matmul_128x128x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 96: simd128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 64: simd128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 2: reduce512x1x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 2: simd1x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 2: reduce512x1x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 2: indirect_load128x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 1: simd1x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 1: simd1x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 1: simd1x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingBottleneck]: 1: indirect_load32x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.007 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.005 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.026 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.006 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LICM]: LICM finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.010 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.007 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.001 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/InferIntrinsicOnCC]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/ValueNumbering]: Running ValueNumbering +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/ValueNumbering]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/ValueNumbering]: ValueNumbering finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/TCTransform]: Running TCTransform +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/TCTransform]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/TCTransform]: TCTransform finished after 0.002 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/CommuteConcat]: Running CommuteConcat +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/CommuteConcat]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/CommuteConcat]: CommuteConcat finished after 0.001 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/RecognizeOpIdiom]: Running RecognizeOpIdiom +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/RecognizeOpIdiom]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/RecognizeOpIdiom]: RecognizeOpIdiom finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.010 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.005 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/DeadStoreElimination]: Running DeadStoreElimination +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.024 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/DeadStoreElimination]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.076 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 19008: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 19008: matmul_128x128x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 594: transpose_128x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 384: dma128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x2048 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x2048 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 96: simd128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 64: simd128x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x1024 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 2: reduce512x1x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 2: simd1x512 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/PostDLOTilingBottleneck]: 2: reduce512x1x1 +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/DeadStoreElimination]: DeadStoreElimination finished after 0.031 seconds +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Recompute]: Running Recompute +2025-08-07T13:53:55Z INFO 49124 [sg0000/Tensorizer/Recompute]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.004 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.010 seconds +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:53:55Z INFO 49125 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.007 seconds +2025-08-07T13:53:55Z INFO 49126 [sg0002/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.004 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.012 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.008 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.031 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.007 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.064 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Recompute]: Recompute finished after 0.000 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.001 seconds +2025-08-07T13:53:56Z INFO 49124 [Tensorizer]: After optimization: 26 statements +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/MutateDataType]: Running MutateDataType +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/MutateDataType]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/MutateDataType]: MutateDataType finished after 0.001 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Running GenericAccessSimplifier +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/GenericAccessSimplifier]: GenericAccessSimplifier finished after 0.002 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Running Simplifier +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.031 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Simplifier]: Simplifier finished after 0.005 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/TileCCOps]: Running TileCCOps +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/TileCCOps]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/TileCCOps]: TileCCOps finished after 0.008 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.007 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.003 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.001 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.092 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.002 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.007 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.132 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/MaskPropagation]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.003 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.003 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.006 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PGTiling]: Running PGTiling +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.022 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 669 of IO tensor {'CrossPassTensor': ''}bfloat16 %input86|NC|(128, 32) is not sorted, index list (w/ AG ids): [(15, 'AG88'), (11, 'AG89')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 670 of IO tensor {'CrossPassTensor': ''}bfloat16 %input87|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(15, 'AG88'), (11, 'AG89')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 671 of IO tensor {'CrossPassTensor': ''}bfloat16 %input85|NHWC|(2, 24, 128, 32, 128) is not sorted, index list (w/ AG ids): [(15, 'AG88'), (11, 'AG89')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 672 of IO tensor {'CrossPassTensor': ''}bfloat16 %input84(32, 2, 128, 24, 128) is not sorted, index list (w/ AG ids): [(7, 'AG93'), (14, 'AG91'), (8, 'AG92')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 673 of IO tensor {'CrossPassTensor': ''}bfloat16 %input90|NC|(128, 32) is not sorted, index list (w/ AG ids): [(15, 'AG88'), (11, 'AG89')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 674 of IO tensor {'CrossPassTensor': ''}bfloat16 %input94(4, 4, 128, 32, 2, 64) is not sorted, index list (w/ AG ids): [(15, 'AG88'), (11, 'AG89')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 679 of IO tensor {'CrossPassTensor': ''}bfloat16 %input92(4, 128, 32, 2, 64) is not sorted, index list (w/ AG ids): [(15, 'AG88'), (11, 'AG89')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 680 of IO tensor {'CrossPassTensor': ''}bfloat16 %input89|NHWC|(4, 128, 32, 128) is not sorted, index list (w/ AG ids): [(15, 'AG88'), (11, 'AG89')] +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 447 of IO tensor {'CrossPassTensor': ''}bfloat16 %input88(16, 128, 4, 4, 2, 128) is not sorted, index list (w/ AG ids): [(2, 'AG103'), (0, 'AG99'), (1, 'AG98'), (3, 'AG102'), (4, 'AG101')] +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.034 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.010 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DeadCodeElimination]: Running DeadCodeElimination +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DeadCodeElimination]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DeadCodeElimination]: DeadCodeElimination finished after 0.002 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LateLowerReshapeOp]: Running LateLowerReshapeOp +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LateLowerReshapeOp]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LateLowerReshapeOp]: LateLowerReshapeOp finished after 0.002 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferIntrinsicOnCC]: Running InferIntrinsicOnCC +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferIntrinsicOnCC]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.005 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferIntrinsicOnCC]: InferIntrinsicOnCC finished after 0.010 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/ResolveAccessConflict]: Running ResolveAccessConflict +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/ResolveAccessConflict]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/ResolveAccessConflict]: ResolveAccessConflict finished after 0.004 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LICM]: LICM finished after 0.002 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LocalLayoutOpt]: Running LocalLayoutOpt +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LocalLayoutOpt]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.007 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LocalLayoutOpt]: LocalLayoutOpt finished after 0.017 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: Running DelinearIndices +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.011 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/InferInitValue]: Running InferInitValue +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.004 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.000 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/InferInitValue]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/InferInitValue]: InferInitValue finished after 0.028 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/DelinearIndices]: DelinearIndices finished after 0.009 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/PGLayoutTilingPipeline]: Running PGLayoutTilingPipeline +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.008 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SimplifyTensor]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.006 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SundaISel]: Running SundaISel +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.005 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: Running LayoutPreprocessingAndAnalysis +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutPreprocessing]: Running LayoutPreprocessing +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Running Delinearization +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/Delinearization]: Delinearization finished after 0.004 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SundaISel]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutPreprocessing]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/SundaISel]: SundaISel finished after 0.044 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.000 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.025 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.002 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutPreprocessing]: LayoutPreprocessing finished after 0.033 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutRequirementAnalysis]: Running LayoutRequirementAnalysis +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutRequirementAnalysis]: LayoutRequirementAnalysis finished after 0.007 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutPreprocessingAndAnalysis]: LayoutPreprocessingAndAnalysis finished after 0.123 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferNonlocalTensors]: Running InferNonlocalTensors +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.128 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PGTiling]: PGTiling finished after 0.452 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.003 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.024 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferNonlocalTensors]: prefer_non_broadcast_par: True +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.004 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.006 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.008 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.042 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 0.933 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferNonlocalTensors]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 1024: transpose_128x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: softmax512x1x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 96: simd128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 64: simd128x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingBottleneck]: 32: simd64x512 +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/InferNonlocalTensors]: InferNonlocalTensors finished after 0.150 seconds +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/PAGLayoutOpt]: Running PAGLayoutOpt +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/ParAxesAnnotation]: Running ParAxesAnnotation +2025-08-07T13:53:56Z INFO 49124 [sg0000/Tensorizer/LayoutSearchAlgorithm]: prefer_non_broadcast_par: True +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.016 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.001 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.013 seconds +2025-08-07T13:53:56Z INFO 49125 [sg0001/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.006 seconds +2025-08-07T13:53:56Z INFO 49126 [sg0002/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.010 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.056 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.028 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronValueNumbering]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.009 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.008 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.005 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.004 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.005 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/VectorizeDMA]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.006 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/DeConcat]: Running DeConcat +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/DeConcat]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/DeConcat]: DeConcat finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.012 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/ParAxesAnnotation]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/ParAxesAnnotation]: ParAxesAnnotation finished after 0.330 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertLocalTransposes]: Running InsertLocalTransposes +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertLocalTransposes]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.150 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 3072: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1024: transpose_128x128 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 384: dma128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: softmax512x1x128 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 128: dma128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x2048 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 96: dma128x2048 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 96: simd128x512 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertLocalTransposes]: InsertLocalTransposes finished after 0.008 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/PAGLayoutOpt]: PAGLayoutOpt finished after 0.368 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/MaskPropagation]: Running MaskPropagation +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/MaskPropagation]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.054 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/MaskPropagation]: MaskPropagation finished after 0.005 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Running CanonicalizeDAGForPGTiling +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/CanonicalizeDAGForPGTiling]: CanonicalizeDAGForPGTiling finished after 0.003 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Running LowerCCOpBlockAxis +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/LowerCCOpBlockAxis]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.009 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/LowerCCOpBlockAxis]: LowerCCOpBlockAxis finished after 0.006 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/PGTiling]: Running PGTiling +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: Running AGOrderingAnalysisPass +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.007 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.006 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.011 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.014 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.006 seconds +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 702 of IO tensor {'CrossPassTensor': ''}bfloat16 %input79|NC|(128, 2, 2, 8) is not sorted, index list (w/ AG ids): [(16, 'AG97'), (11, 'AG100'), (9, 'AG99'), (13, 'AG98')] +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 703 of IO tensor {'CrossPassTensor': ''}bfloat16 %input83(4, 4, 128, 2, 2, 8, 2, 64) is not sorted, index list (w/ AG ids): [(16, 'AG97'), (11, 'AG100'), (9, 'AG99'), (13, 'AG98')] +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 704 of IO tensor {'CrossPassTensor': ''}bfloat16 %input81(4, 128, 2, 2, 8, 2, 64) is not sorted, index list (w/ AG ids): [(16, 'AG97'), (11, 'AG100'), (9, 'AG99'), (13, 'AG98')] +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: P dims of loadstore 705 of IO tensor {'CrossPassTensor': ''}bfloat16 %input78|NHWC|(4, 128, 2, 2, 8, 128) is not sorted, index list (w/ AG ids): [(16, 'AG97'), (11, 'AG100'), (9, 'AG99'), (13, 'AG98')] +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 459 of IO tensor {'CrossPassTensor': ''}bfloat16 %input77(16, 128, 4, 4, 2, 128) is not sorted, index list (w/ AG ids): [(10, 'AG111'), (5, 'AG107'), (8, 'AG106'), (12, 'AG110'), (14, 'AG109')] +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: WARNING: non P dims of loadstore 569 of IO tensor {'IntermediateTensor': ''}bfloat16 %intermediate1(1024, 2, 2, 8, 128) is not sorted, index list (w/ AG ids): [(15, 'AG101'), (11, 'AG100'), (9, 'AG99'), (13, 'AG98')] +2025-08-07T13:53:57Z INFO 49126 [sg0002/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/AGOrderingAnalysisPass]: AGOrderingAnalysisPass finished after 0.082 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Running StaticTransposeLocalTensor +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/StaticTransposeLocalTensor]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.013 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/StaticTransposeLocalTensor]: StaticTransposeLocalTensor finished after 0.005 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/PComputeCutting]: Running PComputeCutting +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/PComputeCutting]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.014 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/PComputeCutting]: PComputeCutting finished after 0.009 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/BFComputeCutting]: Running BFComputeCutting +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/BFComputeCutting]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/BFComputeCutting]: BFComputeCutting finished after 0.004 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/LoopSplitting]: Running LoopSplitting +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/LoopSplitting]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/LoopSplitting]: LoopSplitting finished after 0.000 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/MacroGeneration]: Running MacroGeneration +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.011 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.004 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.001 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.003 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyMacroPredicates]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.007 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/InferInitValue]: Running InferInitValue +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/MacroGeneration]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/MacroGeneration]: MacroGeneration finished after 0.081 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/PGTiling]: PGTiling finished after 0.270 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertIOTransposes]: Running InsertIOTransposes +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/InferInitValue]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertIOTransposes]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/InferInitValue]: InferInitValue finished after 0.043 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertIOTransposes]: InsertIOTransposes finished after 0.016 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertOffloadedTransposes]: Running InsertOffloadedTransposes +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertOffloadedTransposes]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.013 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyTensor]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.007 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LICM]: LICM finished after 0.004 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SundaISel]: Running SundaISel +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InsertOffloadedTransposes]: InsertOffloadedTransposes finished after 0.003 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/DramToDramTranspose]: Running DramToDramTranspose +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SundaISel]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/SundaISel]: SundaISel finished after 0.045 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/DramToDramTranspose]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.000 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.020 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/DramToDramTranspose]: DramToDramTranspose finished after 0.035 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/PGLayoutTilingPipeline]: PGLayoutTilingPipeline finished after 1.321 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingProfiler]: Running TilingProfiler +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 1024: transpose_128x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 256: softmax512x1x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 128: transpose_128x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 64: simd128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 32: indirect_load128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 32: rmsnorm128x512x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 32: simd128x256 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 32: simd128x256 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 32: simd128x512 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingBottleneck]: 32: transpose_128x128 +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingProfiler]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.023 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/TilingProfiler]: TilingProfiler finished after 0.016 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.003 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.017 seconds +2025-08-07T13:53:57Z INFO 49124 [sg0000/Tensorizer/InferNeuronTensor]: Running InferNeuronTensor +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.009 seconds +2025-08-07T13:53:57Z INFO 49125 [sg0001/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.022 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/InferNeuronTensor]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/InferNeuronTensor]: InferNeuronTensor finished after 0.051 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.009 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LICM]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/RewriteReplicationMatmul]: Running RewriteReplicationMatmul +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/RewriteReplicationMatmul]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/RewriteReplicationMatmul]: RewriteReplicationMatmul finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.061 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronValueNumbering]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.008 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.014 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/DataLocalityOpt]: Running DataLocalityOpt +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.009 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/VectorizeDMA]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.001 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/DeConcat]: Running DeConcat +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/DeConcat]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/DeConcat]: DeConcat finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.013 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/DataLocalityOpt]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/DataLocalityOpt]: DataLocalityOpt finished after 0.114 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/DMATilingProfiler]: Running DMATilingProfiler +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: +20 MACROS WITH LARGEST INSTRUCTION COUNTS: +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1024: transpose_128x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 1024: matmul_128x128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: transpose_128x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: softmax512x1x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 256: matmul_128x128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: transpose_128x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 128: dma128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: dma128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 64: rmsnorm128x512x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: indirect_load128x512 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x1024 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: dma128x2048 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: rmsnorm128x512x128 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PostDLOTilingBottleneck]: 32: simd128x256 +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/DMATilingProfiler]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.043 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/DMATilingProfiler]: DMATilingProfiler finished after 0.005 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.012 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LegalizeSundaMacro]: Running LegalizeSundaMacro +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LegalizeSundaMacro]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LegalizeSundaMacro]: LegalizeSundaMacro finished after 0.010 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.010 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.001 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/SpillPSum]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.012 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PerfectLoopNest]: Running PerfectLoopNest +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PerfectLoopNest]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/PerfectLoopNest]: PerfectLoopNest finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/SpillPSum]: SpillPSum finished after 0.013 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.011 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/RewriteWeights]: Running RewriteWeights +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/RewriteWeights]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.124 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/RewriteWeights]: RewriteWeights finished after 0.004 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/ReshapeWeights]: Running ReshapeWeights +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/ReshapeWeights]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/ReshapeWeights]: ReshapeWeights finished after 0.001 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Running FlattenMacroLoop +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.021 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FlattenMacroLoop]: FlattenMacroLoop finished after 0.004 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyMacroPredicates]: Running SimplifyMacroPredicates +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.019 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyMacroPredicates]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyMacroPredicates]: SimplifyMacroPredicates finished after 0.016 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/InferInitValue]: Running InferInitValue +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.021 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.006 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/InferInitValue]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/InferInitValue]: InferInitValue finished after 0.035 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Running NeuronSimplifier +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.013 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifier]: NeuronSimplifier finished after 0.012 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyTensor]: Running SimplifyTensor +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyTensor]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.015 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/SpillPSum]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SimplifyTensor]: SimplifyTensor finished after 0.007 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LICM]: Running LICM +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LICM]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/SpillPSum]: SpillPSum finished after 0.023 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LICM]: LICM finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SundaISel]: Running SundaISel +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.043 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SundaISel]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizeType]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/SundaISel]: SundaISel finished after 0.046 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronAliasDependencyReset]: Running NeuronAliasDependencyReset +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: Running AliasDependencyElimination +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/AliasDependencyElimination]: AliasDependencyElimination finished after 0.000 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Running NeuronAliasDependencyInduction +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronAliasDependencyInduction]: NeuronAliasDependencyInduction finished after 0.000 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronAliasDependencyReset]: NeuronAliasDependencyReset finished after 0.022 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LowerComplexBroadcast]: Running LowerComplexBroadcast +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LowerComplexBroadcast]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/LowerComplexBroadcast]: LowerComplexBroadcast finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.002 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.314 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.010 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopFusion]: Running NeuronLoopFusion +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.009 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopFusion]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LegalizeType]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizeType]: LegalizeType finished after 0.005 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopFusion]: NeuronLoopFusion finished after 0.018 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopInterchange]: Running NeuronLoopInterchange +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopInterchange]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLoopInterchange]: NeuronLoopInterchange finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.010 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/LegalizeType]: LegalizeType finished after 0.012 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.014 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FactorizeBlkDims]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.046 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/WeightCoalescing]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.026 seconds +2025-08-07T13:53:58Z INFO 49124 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.036 seconds +2025-08-07T13:53:58Z INFO 49126 [sg0002/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.016 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.003 seconds +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-08-07T13:53:58Z INFO 49125 [sg0001/Tensorizer/TensorInitialization]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.005 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.002 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.002 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SimplifyNeuronTensor]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.047 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronValueNumbering]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.010 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DataStreaming]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DataStreaming]: DataStreaming finished after 0.005 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.010 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/VectorizeDMA]: Running VectorizeDMA +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/VectorizeDMA]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/VectorizeDMA]: VectorizeDMA finished after 0.007 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.009 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizePartitionReduce]: Running LegalizePartitionReduce +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizePartitionReduce]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizePartitionReduce]: LegalizePartitionReduce finished after 0.001 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/DeConcat]: Running DeConcat +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/DeConcat]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/DeConcat]: DeConcat finished after 0.001 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Running FactorizeThreadAxesInFreeDims +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/FactorizeThreadAxesInFreeDims]: FactorizeThreadAxesInFreeDims finished after 0.001 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/PartialSimdFusion]: Running PartialSimdFusion +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/PartialSimdFusion]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/PartialSimdFusion]: PartialSimdFusion finished after 0.035 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/TritiumFusion]: Running TritiumFusion +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.284 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.008 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/TritiumFusion]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/TritiumFusion]: TritiumFusion finished after 0.082 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.201 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.018 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/VectorizeMatMult]: Running VectorizeMatMult +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/VectorizeMatMult]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/VectorizeMatMult]: VectorizeMatMult finished after 0.018 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/PartialLoopFusion]: Running PartialLoopFusion +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.076 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/PartialLoopFusion]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.005 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SimpleAllReduceTiling]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.014 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 1.041ms (48.000MiB, est bw: 48.348GB/s, 45.083% of tot. time) for bfloat16<128 x 128> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 24, 128, 512) %'input84_local_915'[i15_0_0_921_0_0_1176,i15_0_0_921_0_1_1176,i15_0_0_1,c1_909,c2_910,i0.128,i1.128+128p_1377] = load bfloat16<128 x 128> {'CrossPassTensor': ''}bfloat16 (8, 4, 2, 128, 24, 128) %'input84'[4i15_0_0_921_0_0_1176+2i15_0_0_921_0_1_1176+i15_0_0_1,p_1377,c1_909,i0.128,c2_910,i1.128] # id=1086, src_id=None, , instances=1536 # dl = tensor_op_name: _dot.6 | hlo_id: 49 | [[i0.128];[i1.128]] -> [[i0.128];[i1.128]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 488.243us (96.000MiB, est bw: 206.175GB/s, 21.144% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 24, 2, 128, 2048) %1177[i11_0,i10_0_0,i10_0_1,c2_890,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input85'[i10_0_0,i10_0_1,i0.128,i1.2048+2048c2_890] # id=1077, src_id=None, , instances=192 # dl = tensor_op_name: _dot.4 | hlo_id: 39 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/TensorInitialization]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 244.771us (48.000MiB, est bw: 205.627GB/s, 10.600% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 24, 2, 128, 2048) %'input87_local_905'[i12_0_0,4i12_0_1_0+i12_0_1_1,c2_900,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input87'[i12_0_0,4i12_0_1_0+i12_0_1_1,i0.128,i1.2048+2048c2_900] # id=1080, src_id=None, , instances=96 # dl = tensor_op_name: _dot.5 | hlo_id: 30 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 163.614us (32.000MiB, est bw: 205.083GB/s, 7.086% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 4, 4, 2, 128, 2048) %1178[i40_0,i41_0,i41_1,c2_931,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (4, 4, 128, 4096) %'input94'[i41_0,i41_1,i0.128,i1.2048+2048c2_931] # id=1100, src_id=None, , instances=64 # dl = tensor_op_name: _dot.9 | hlo_id: 67 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 99.175us (16.000MiB, est bw: 169.167GB/s, 4.295% of tot. time) for bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 4, 512) %'input88_local_1001'[i115_0_0_0_1007_0_0_1179,i115_0_0_0_1007_0_1_1179,i115_0_0_0_1,c1_994_1793,i0.128,i3.4,i1.128+128i2.2+256p_1392_1793] = load bfloat16<128 x 1024> {'CrossPassTensor': ''}bfloat16 (8, 2, 128, 4, 4, 2, 128) %'input88'[4i115_0_0_0_1007_0_0_1179+2i115_0_0_0_1007_0_1_1179+i115_0_0_0_1,p_1392_1793,i0.128,c1_994_1793,i3.4,i2.2,i1.128] # id=1156, src_id=None, , instances=64 # dl = tensor_op_name: _dot.10 | hlo_id: 165 | [[i0.128];[i1.128, i2.2, i3.4]] -> [[i0.128];[i1.128, i2.2, i3.4]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 1.814% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 4, 2, 128, 2048) %'842.1337'[i11_0,T_i1_0,T_i2_0_1790,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2, 4, 128, 4096) %'add.4'[i11_0,T_i1_0,i0.128,i1.2048+2048T_i2_0_1790] # id=1181, src_id=None, , instances=16 # dl = tensor_op_name: add.4_pftranspose_842 | hlo_id: 17 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 1.814% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %'846.1342'[i40_0,T_i17_0_854_0,2T_i1_0_0_1791+T_i1_0_1_1791,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (4194304,) %'all_reduce.1-buffer-1828'[2097152i40_0+4096i0.128+2048T_i17_0_854_0+i1.2048+1048576T_i1_0_0_1791+524288T_i1_0_1_1791] # id=1190, src_id=None, , instances=16 # dl = tensor_op_name: all_reduce.1_pftranspose_846 | hlo_id: 52 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 30.524us (8.000MiB, est bw: 274.819GB/s, 1.322% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (4194304,) %'dot.7-buffer-1826'[2048i15_0_0_921_0_0_1176+4096i0.128+1024i15_0_0_921_0_1_1176+i1.1024+2097152i16_0_0_921_1176+524288i16_0_1_921_1176] = store bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 1024) %922[i15_0_0_921_0_0_1176,i15_0_0_921_0_1_1176,i16_0_0_921_1176,i16_0_1_921_1176,i0.128,i1.1024] # id=1089, src_id=None, , instances=32 # dl = tensor_op_name: _dot.6 | hlo_id: 49 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 30.524us (8.000MiB, est bw: 274.819GB/s, 1.322% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (4194304,) %'dot.11-buffer-1831'[2048i115_0_0_0_1007_0_0_1179+4096i0.128+1024i115_0_0_0_1007_0_1_1179+i1.1024+2097152i116_0_0_1007_1179+524288i116_0_1_1007_1179] = store bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 1024) %1008[i115_0_0_0_1007_0_0_1179,i115_0_0_0_1007_0_1_1179,i116_0_0_1007_1179,i116_0_1_1007_1179,i0.128,i1.1024] # id=1159, src_id=None, , instances=32 # dl = tensor_op_name: _dot.10 | hlo_id: 165 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Est. DMA time: 25.532us (8.000MiB, est bw: 328.547GB/s, 1.106% of tot. time) for bfloat16<128 x 2048> {'IntermediateTensor': ''}bfloat16 (1, 2, 4, 128, 4096) %'intermediate6'(init=0.0)[0,i40_0,2T_i18_1_0_854_0+T_i18_1_0_854_1,i0.128,2048T_i17_0_854_0+i1.2048] = store bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2048) %'850.1397'[i40_0,T_i17_0_854_0,2T_i18_1_0_854_0+T_i18_1_0_854_1,i0.128,i1.2048] # id=1194, src_id=None, , instances=16 # dl = tensor_op_name: intermediate6_pftranspose_850 | hlo_id: 2 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.016 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/OptimizeNKIKernels]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.002 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/PartialLoopFusion]: PartialLoopFusion finished after 0.027 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.008 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerTranspose]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.043 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.013 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.002 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LateNeuronInstComb]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.020 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.120 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.003 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.011 seconds +2025-08-07T13:53:59Z INFO 49126 [sg0002/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.017 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/SplitAccGrp]: Running SplitAccGrp +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/SplitAccGrp]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/SplitAccGrp]: SplitAccGrp finished after 0.002 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49125 [sg0001/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.044 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/SpillPSum]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/SpillPSum]: SpillPSum finished after 0.022 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:53:59Z INFO 49125 [Tensorizer]: BirCodeGen estimate #instances=20003 in sg0001 +2025-08-07T13:53:59Z INFO 49125 [Tensorizer]: IR signature: fa0435b1d147525f8e0db1c0594a5b376a85965783d94d09e0944cb7850cde48 for nc00/sg0001/TensorizerBIR +2025-08-07T13:53:59Z INFO 49125 [Tensorizer]: Weights total number of bytes: 196608 +2025-08-07T13:53:59Z INFO 49125 [Tensorizer]: Successfully built model. +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerIntrinsics]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.040 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/InlineNativeKernels]: Running InlineNativeKernels +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/InlineNativeKernels]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/InlineNativeKernels]: InlineNativeKernels finished after 0.002 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizeType]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizeType]: LegalizeType finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.012 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/InferPSumTensor]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.054 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/WeightCoalescing]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.003 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.040 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/RelaxPredicates]: Running RelaxPredicates +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/RelaxPredicates]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/RelaxPredicates]: RelaxPredicates finished after 0.004 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/TensorInitialization]: Running TensorInitialization +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/TensorInitialization]: Finished (changed=True) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/TensorInitialization]: TensorInitialization finished after 0.022 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.017 seconds +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:53:59Z INFO 49124 [sg0000/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.003 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SimplifyNeuronTensor]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.015 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.002 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DataStreaming]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DataStreaming]: DataStreaming finished after 0.006 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.277 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.007 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.005 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SimpleAllReduceTiling]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.002 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 163.614us (32.000MiB, est bw: 205.083GB/s, 29.598% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 4, 4, 2, 128, 2048) %1536[i47_0_0,i48_0_1535,i32_0_0_1,c2_1255,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (4, 4, 128, 2, 2048) %'input83'[i48_0_1535,i32_0_0_1,i0.128,c2_1255,i1.2048] # id=1415, src_id=None, , instances=64 # dl = tensor_op_name: _dot.2 | hlo_id: 34 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 99.175us (16.000MiB, est bw: 169.167GB/s, 17.941% of tot. time) for bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 4, 512) %'input77_local_1302'[i122_0_0_0_1308_0_0_1537,i122_0_0_0_1308_0_1_1537,i122_0_0_0_1,c1_1295_2205,i0.128,i3.4,i1.128+128i2.2+256p_1838_2205] = load bfloat16<128 x 1024> {'CrossPassTensor': ''}bfloat16 (8, 2, 128, 4, 4, 2, 128) %'input77'[4i122_0_0_0_1308_0_0_1537+2i122_0_0_0_1308_0_1_1537+i122_0_0_0_1,p_1838_2205,i0.128,c1_1295_2205,i3.4,i2.2,i1.128] # id=1520, src_id=None, , instances=64 # dl = tensor_op_name: _dot.3 | hlo_id: 145 | [[i0.128];[i1.128, i2.2, i3.4]] -> [[i0.128];[i1.128, i2.2, i3.4]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 50.238us (8.000MiB, est bw: 166.979GB/s, 9.088% of tot. time) for bfloat16<128 x 1024> TongaSB partitions[3] bfloat16 (2, 2, 8, 128, 2, 512) %'intermediate1_pftranspose_1180'[T_i1_1_0_1184,T_i1_0_1184,i1_1_1_0_2202,i0.128,i2.2,i1.512] = load bfloat16<128 x 1024> DRAM2DBlk partitions[1] bfloat16 (2, 1, 2, 8, 128, 1024) %'all_gather.1'[T_i1_1_0_1184,0,T_i1_0_1184,i1_1_1_0_2202,i0.128,i1.512+512i2.2] # id=1374, src_id=None, , instances=32 # dl = tensor_op_name: UnnamedModule | hlo_id: 1 | [[i0.128];[i1.512, i2.2]] -> [[i0.128];[i1.512, i2.2]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 7.576% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (2, 8, 128, 2, 1024) %'all_gather.1_local_1239'[i29_0_1_0_1243,i29_0_1_1_1243,i0.128,i2.2,i1.1024] = load bfloat16<128 x 2048> DRAM2DBlk partitions[1] bfloat16 (2, 1, 2, 8, 128, 1024) %'all_gather.1'[i29_0_1_0_1243,0,i2.2,i29_0_1_1_1243,i0.128,i1.1024] # id=1410, src_id=None, , instances=16 # dl = tensor_op_name: _custom-call.226 | hlo_id: 27 | [[i0.128];[i1.1024, i2.2]] -> [[i0.128];[i1.1024, i2.2]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 30.524us (8.000MiB, est bw: 274.819GB/s, 5.522% of tot. time) for bfloat16<128 x 1024> {'IntermediateTensor': ''}bfloat16 (2, 4, 128, 2, 2, 1024) %'intermediate1'(init=0.0)[T_i0_0_1184,T_i0_1_1184_0,i0.128,T_i1_0_1184,T_i1_1_0_1184,i1.1024] = store bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 1024) %'1180.1850'[T_i1_1_0_1184,T_i1_0_1184,T_i0_0_1184,T_i0_1_1184_0,i0.128,i1.1024] # id=1553, src_id=None, , instances=32 # dl = tensor_op_name: intermediate1_pftranspose_1180 | hlo_id: 1 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 30.524us (8.000MiB, est bw: 274.819GB/s, 5.522% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (4194304,) %'dot.4-buffer-2238'[2048i122_0_0_0_1308_0_0_1537+4096i0.128+1024i122_0_0_0_1308_0_1_1537+i1.1024+2097152i123_0_0_1308_1537+524288i123_0_1_1308_1537] = store bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 1024) %1309[i122_0_0_0_1308_0_0_1537,i122_0_0_0_1308_0_1_1537,i123_0_0_1308_1537,i123_0_1_1308_1537,i0.128,i1.1024] # id=1523, src_id=None, , instances=32 # dl = tensor_op_name: _dot.3 | hlo_id: 145 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 25.769us (4.000MiB, est bw: 162.767GB/s, 4.662% of tot. time) for bfloat16<128 x 1024> TongaSB partitions[2] bfloat16 (2, 8, 128, 1024) %'transpose.1_pftranspose_1175'[T_i12_0_1179,i13_0,i0.128,i1.1024] = indirect_load bfloat16<128 x 1024> {'CrossPassTensor': ''}bfloat16 (151936, 2, 1024) %'input76'[i0.128,T_i12_0_1179,i1.1024] generic generic_dims:[0] generic_addrs: int32<128 x 1> TongaSB partitions[0] int32 (128, 8, 1) %'input0_local_1215'[i0.128,i13_0,0] # id=1371, src_id=None, , attrs={'mode': OOBMode.ERROR}, instances=16 # dl = tensor_op_name: _gather.41 | hlo_id: 16 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 3.906% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (4, 2, 128, 2048) %'input81_local_1276'[i120_0_2206,c1_1270_2203_2206,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (4, 128, 2, 2048) %'input81'[i120_0_2206,i0.128,c1_1270_2203_2206,i1.2048] # id=1460, src_id=None, , instances=8 # dl = tensor_op_name: _dot.1 | hlo_id: 82 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 21.589us (4.000MiB, est bw: 194.277GB/s, 3.906% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (4, 2, 128, 2048) %'input78_local_1291'[i120_0_2207,c1_1285_2204_2207,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (4, 128, 2, 2048) %'input78'[i120_0_2207,i0.128,c1_1285_2204_2207,i1.2048] # id=1514, src_id=None, , instances=8 # dl = tensor_op_name: _dot | hlo_id: 131 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Est. DMA time: 15.912us (4.000MiB, est bw: 263.593GB/s, 2.879% of tot. time) for bfloat16<128 x 1024> DRAM2DBlk partitions[1] bfloat16 (2, 1, 2, 4, 128, 8, 128) %'transpose.1'[T_i12_0_1179,0,T_i12_1_0_1179,T_i12_1_1_1179_0,i0.128,i2.4+4i3.2,i1.128] = store bfloat16<128 x 1024> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 2, 512) %'1175.1848'[T_i12_0_1179,T_i12_1_0_1179,T_i12_1_1_1179_0,i0.128,i3.2,i1.128+128i2.4] # id=1540, src_id=None, , instances=16 # dl = tensor_op_name: transpose.1_pftranspose_1175 | hlo_id: 16 | [[i0.128];[i1.128, i2.4, i3.2]] -> [[i0.128];[i1.128, i2.4, i3.2]] +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.005 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/OptimizeNKIKernels]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.002 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.030 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.005 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.041 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.003 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.006 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/SimplifyNeuronTensor]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 1.056 seconds +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/DMALocalityOpt]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49124 [sg0000/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.063 seconds +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.005 seconds +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/DataStreaming]: Finished (changed=True) +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/DataStreaming]: DataStreaming finished after 0.037 seconds +2025-08-07T13:54:00Z INFO 49126 [sg0002/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:54:00Z INFO 49124 [Tensorizer]: BirCodeGen estimate #instances=12096 in sg0000 +2025-08-07T13:54:00Z INFO 49124 [Tensorizer]: IR signature: 2cf1e920f2ce14b5e1349e8b1c65714884a444126d124dbda7ee4f3cff30972e for nc00/sg0000/TensorizerBIR +2025-08-07T13:54:00Z INFO 49124 [Tensorizer]: Weights total number of bytes: 196864 +2025-08-07T13:54:00Z INFO 49124 [Tensorizer]: Successfully built model. +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.275 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/LateLegalizeInst]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.014 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/CoalesceCCOp]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.014 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SimpleAllReduceTiling]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.009 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 3.014ms (594.000MiB, est bw: 206.636GB/s, 57.816% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[2] bfloat16 (594, 2, 128, 2048) %'700.1100'[i31_0,T_i1_0_2805,i0.128,i1.128+128i2.16] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (75968, 32, 128) %'input473'[128i31_0+i0.128,16T_i1_0_2805+i2.16,i1.128] # id=1099, src_id=None, , instances=1188 # dl = tensor_op_name: input473_pftranspose_700 | hlo_id: 90 | if -128i31_0-i0.128+75967 >= 0 [[i0.128];[i1.128, i2.16]] -> [[i0.128];[i1.128, i2.16]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 1.041ms (48.000MiB, est bw: 48.348GB/s, 19.968% of tot. time) for bfloat16<128 x 128> TongaSB partitions[5] bfloat16 (2, 2, 2, 2, 24, 128, 512) %'input469_local_773'[i15_0_0_779_0_0_1072,i15_0_0_779_0_1_1072,i15_0_0_1,c1_767,c2_768,i0.128,i1.128+128p_2181] = load bfloat16<128 x 128> {'CrossPassTensor': ''}bfloat16 (8, 4, 2, 128, 24, 128) %'input469'[4i15_0_0_779_0_0_1072+2i15_0_0_779_0_1_1072+i15_0_0_1,p_2181,c1_767,i0.128,c2_768,i1.128] # id=951, src_id=None, , instances=1536 # dl = tensor_op_name: _dot.256 | hlo_id: 59 | [[i0.128];[i1.128]] -> [[i0.128];[i1.128]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 488.243us (96.000MiB, est bw: 206.175GB/s, 9.365% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[4] bfloat16 (2, 2, 24, 2, 128, 2048) %1073[i11_0,i10_0_0,i10_0_1,c2_748,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input470'[i10_0_0,i10_0_1,i0.128,i1.2048+2048c2_748] # id=942, src_id=None, , instances=192 # dl = tensor_op_name: _dot.254 | hlo_id: 49 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 244.771us (48.000MiB, est bw: 205.627GB/s, 4.695% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 24, 2, 128, 2048) %'input472_local_763'[i12_0_0,4i12_0_1_0+i12_0_1_1,c2_758,i0.128,i1.2048] = load bfloat16<128 x 2048> {'CrossPassTensor': ''}bfloat16 (2, 24, 128, 4096) %'input472'[i12_0_0,4i12_0_1_0+i12_0_1_1,i0.128,i1.2048+2048c2_758] # id=945, src_id=None, , instances=96 # dl = tensor_op_name: _dot.255 | hlo_id: 40 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 191.807us (297.000KiB, est bw: 1.586GB/s, 3.679% of tot. time) for float32<1 x 128> {'no_delinear': '0'}non_local float32 (1, 75968) %'convert.59'[0,128i31_0+i0.128] = store float32<1 x 128> TongaSB partitions[1] float32 (594, 1, 128) %'dot.257.1110'[i31_0,0,i0.128] # id=1108, src_id=None, , instances=594 # dl = tensor_op_name: _dot.257 | hlo_id: 90 | if -128i31_0-i0.128+75967 >= 0 [[];[i0.128]] -> [[];[i0.128]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 0.803% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 4, 2, 128, 2048) %'704.2160'[i11_0,T_i1_0,T_i2_0_2803,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (2, 4, 128, 4096) %'add.9'[i11_0,T_i1_0,i0.128,i1.2048+2048T_i2_0_2803] # id=1074, src_id=None, , instances=16 # dl = tensor_op_name: add.9_pftranspose_704 | hlo_id: 27 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 41.879us (8.000MiB, est bw: 200.308GB/s, 0.803% of tot. time) for bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 4, 2, 128, 2048) %'708.2165'[T_i20_0_716,T_i1_0,T_i2_0_2804,i0.128,i1.2048] = load bfloat16<128 x 2048> non_local bfloat16 (4194304,) %'all_reduce.3-buffer-2825'[2097152T_i20_0_716+4096i0.128+524288T_i1_0+i1.2048+2048T_i2_0_2804] # id=1083, src_id=None, , instances=16 # dl = tensor_op_name: all_reduce.3_pftranspose_708 | hlo_id: 62 | [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 30.524us (8.000MiB, est bw: 274.819GB/s, 0.585% of tot. time) for bfloat16<128 x 1024> non_local bfloat16 (4194304,) %'dot.14-buffer-2823'[2048i15_0_0_779_0_0_1072+4096i0.128+1024i15_0_0_779_0_1_1072+i1.1024+2097152i16_0_0_779_1072+524288i16_0_1_779_1072] = store bfloat16<128 x 1024> TongaSB partitions[4] bfloat16 (2, 2, 2, 4, 128, 1024) %780[i15_0_0_779_0_0_1072,i15_0_0_779_0_1_1072,i16_0_0_779_1072,i16_0_1_779_1072,i0.128,i1.1024] # id=954, src_id=None, , instances=32 # dl = tensor_op_name: _dot.256 | hlo_id: 59 | [[i0.128];[i1.1024]] -> [[i0.128];[i1.1024]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 25.532us (8.000MiB, est bw: 328.547GB/s, 0.490% of tot. time) for bfloat16<128 x 2048> non_local bfloat16 (1024, 32, 128) %'convert.57'[512T_i20_0_716+i0.128+128T_i20_1_716_0,16T_i19_0_716_0_1170+i2.4+4i3.4,i1.128] = store bfloat16<128 x 2048> TongaSB partitions[3] bfloat16 (2, 2, 4, 128, 4, 512) %'712.2569'[T_i20_0_716,T_i19_0_716_0_1170,T_i20_1_716_0,i0.128,i3.4,i1.128+128i2.4] # id=1087, src_id=None, , instances=16 # dl = tensor_op_name: convert.57_pftranspose_712 | hlo_id: 70 | [[i0.128];[i1.128, i2.4, i3.4]] -> [[i0.128];[i1.128, i2.4, i3.4]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Est. DMA time: 22.647us (296.758KiB, est bw: 13.418GB/s, 0.434% of tot. time) for float32<1 x 15194> TongaSB partitions[1] float32 (5, 1, 15194) %'custom-call.411.1179'[i1,0,i0.15194] = load float32<1 x 15194> {'no_delinear': '0'}non_local float32 (1, 75968) %'convert.59'[15194i1+i0.15194] # id=1174, src_id=None, , instances=5 # dl = tensor_op_name: _custom-call.411 | hlo_id: 93 | if -15194i1-i0.15194+75967 >= 0 [[];[i0.15194]] -> [[];[i0.15194]] +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.012 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/OptimizeNKIKernels]: Running OptimizeNKIKernels +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.002 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.003 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DoNothing]: Running DoNothing +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DoNothing]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DoNothing]: DoNothing finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/FactorizeBlkDims]: Running FactorizeBlkDims +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/FactorizeBlkDims]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/FactorizeBlkDims]: FactorizeBlkDims finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronValueNumbering]: Running NeuronValueNumbering +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronValueNumbering]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronValueNumbering]: NeuronValueNumbering finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Running NeuronInstComb +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronInstComb]: NeuronInstComb finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerTranspose]: Running LowerTranspose +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerTranspose]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerTranspose]: LowerTranspose finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerBroadcast]: Running LowerBroadcast +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerBroadcast]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerBroadcast]: LowerBroadcast finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateNeuronInstComb]: Running LateNeuronInstComb +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateNeuronInstComb]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateNeuronInstComb]: LateNeuronInstComb finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SpillPSum]: Running SpillPSum +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SpillPSum]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SpillPSum]: SpillPSum finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerIntrinsics]: Running LowerIntrinsics +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerIntrinsics]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LowerIntrinsics]: LowerIntrinsics finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeType]: Running LegalizeType +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeType]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeType]: LegalizeType finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronLICM]: Running NeuronLICM +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronLICM]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronLICM]: NeuronLICM finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/InferPSumTensor]: Running InferPSumTensor +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/InferPSumTensor]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/InferPSumTensor]: InferPSumTensor finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/WeightCoalescing]: Running WeightCoalescing +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/WeightCoalescing]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/WeightCoalescing]: WeightCoalescing finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeSundaAccess]: Running LegalizeSundaAccess +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeSundaAccess]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LegalizeSundaAccess]: LegalizeSundaAccess finished after 0.002 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Running NeuronSimplifyPredicates +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronSimplifyPredicates]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/NeuronSimplifyPredicates]: NeuronSimplifyPredicates finished after 0.003 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/ExpandISAMacro]: Running ExpandISAMacro +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/ExpandISAMacro]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/ExpandISAMacro]: ExpandISAMacro finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimplifyNeuronTensor]: Running SimplifyNeuronTensor +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimplifyNeuronTensor]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimplifyNeuronTensor]: SimplifyNeuronTensor finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMALocalityOpt]: Running DMALocalityOpt +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMALocalityOpt]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMALocalityOpt]: DMALocalityOpt finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DataStreaming]: Running DataStreaming +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DataStreaming]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DataStreaming]: DataStreaming finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SFKVectorizer]: Running SFKVectorizer +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SFKVectorizer]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SFKVectorizer]: SFKVectorizer finished after 0.003 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateLegalizeInst]: Running LateLegalizeInst +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateLegalizeInst]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/LateLegalizeInst]: LateLegalizeInst finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/CoalesceCCOp]: Running CoalesceCCOp +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/CoalesceCCOp]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/CoalesceCCOp]: CoalesceCCOp finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimpleAllReduceTiling]: Running SimpleAllReduceTiling +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimpleAllReduceTiling]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/SimpleAllReduceTiling]: SimpleAllReduceTiling finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Running DMAProfiler +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Top 10 (estimated) latency DMAs: +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 5.852us (1.000MiB, est bw: 179.191GB/s, 59.288% of tot. time) for float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %13[i0.128,i1.2048] = load float32<128 x 2048> float32 (1, 256) %'x'[i0.128,i1.2048] # id=8, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Est. DMA time: 4.018us (1.000MiB, est bw: 260.951GB/s, 40.712% of tot. time) for float32<128 x 2048> float32 (1, 256) %'y'[i0.128,i1.2048] = store float32<128 x 2048> TongaSB partitions[0] float32 (128, 2048) %11[i0.128,i1.2048] # id=10, src_id=None, , instances=1 # dl = tensor_op_name: | if i0.128 == 0 and -i1.2048+255 >= 0 [[i0.128];[i1.2048]] -> [[i0.128];[i1.2048]] +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [cumsum/Tensorizer/DMAProfiler]: DMAProfiler finished after 0.001 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/OptimizeNKIKernels]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/OptimizeNKIKernels]: OptimizeNKIKernels finished after 0.388 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/CCOpFusion]: Running CCOpFusion +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/CCOpFusion]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/CCOpFusion]: CCOpFusion finished after 0.033 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/StaticProfiler]: Running StaticProfiler +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/StaticProfiler]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/StaticProfiler]: StaticProfiler finished after 0.014 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SplitAPUnionSets]: Running SplitAPUnionSets +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SplitAPUnionSets]: Finished (changed=True) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/SplitAPUnionSets]: SplitAPUnionSets finished after 0.113 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/LateLegalizePostSplit]: Running LateLegalizePostSplit +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/LateLegalizePostSplit]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/LateLegalizePostSplit]: LateLegalizePostSplit finished after 0.015 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DumpGraphAndMetadata]: Running DumpGraphAndMetadata +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DumpGraphAndMetadata]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/DumpGraphAndMetadata]: DumpGraphAndMetadata finished after 0.087 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Running ZeroSizeTensorElimination +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/ZeroSizeTensorElimination]: Finished (changed=False) +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/ZeroSizeTensorElimination]: ZeroSizeTensorElimination finished after 0.000 seconds +2025-08-07T13:54:01Z INFO 49126 [sg0002/Tensorizer/BirCodeGenLoop]: Running BirCodeGenLoop +2025-08-07T13:54:02Z INFO 49126 [sg0002/Tensorizer/BirCodeGenLoop]: Finished (changed=False) +2025-08-07T13:54:02Z INFO 49126 [sg0002/Tensorizer/BirCodeGenLoop]: BirCodeGenLoop finished after 0.477 seconds +2025-08-07T13:54:02Z INFO 49126 [Tensorizer]: BirCodeGen estimate #instances=106646 in sg0002 +2025-08-07T13:54:02Z INFO 49126 [Tensorizer]: IR signature: 8d77f77b258269ebb8a4baf6acd121a87992261d39af9caef170466eb257b177 for nc00/sg0002/TensorizerBIR +2025-08-07T13:54:02Z INFO 49126 [Tensorizer]: Weights total number of bytes: 135176 +2025-08-07T13:54:02Z INFO 49126 [Tensorizer]: Successfully built model. +2025-08-07T13:54:02Z USER 48502 [root/Tensorizer/Tensorizer]: Tensorizer finished after 10.055 seconds +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: End tensorization +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input76 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input0 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input79 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input83 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input82 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input1 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input81 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input80 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input78 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input77 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input4 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input2 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input5 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input86 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input87 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input85 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input84 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input90 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input94 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input93 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input92 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input91 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input89 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input88 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input6 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input2 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input7 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input471 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input472 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input470 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input469 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input474 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input1 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input473 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Network input: input3 +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: wrote bir.json +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: wrote tensor_map.json +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: wrote bir.json +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: wrote tensor_map.json +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: wrote bir.json +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: wrote tensor_map.json +2025-08-07T13:54:02Z INFO 48502 [job.Frontend.0]: Job #0 finished +2025-08-07T13:54:02Z INFO 48502 [pipeline.Pipeline.0]: Finished job job.Frontend.0 +2025-08-07T13:54:02Z INFO 48502 [pipeline.Pipeline.0]: Starting job job.StaticIOTranspose.0 +2025-08-07T13:54:02Z INFO 48502 [pipeline.Pipeline.0]: Finished job job.StaticIOTranspose.0 +2025-08-07T13:54:02Z INFO 48502 [pipeline.Pipeline.0]: Starting job job.WalrusDriver.0 +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: BackendDriver has 3 states with 1 core LNC +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: BackendDriver MT cwd: /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6 +2025-08-07T13:54:02Z INFO 48502 [job.BIRLinker.1]: Creating directory sgLnk/sg00 +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: StateId sg00 Dir /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/sg00 +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: StateId sg01 Dir /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/sg01 +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: StateId sg02 Dir /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/sg02 +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: Number of subgraphs to link: 3 +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: lnkState: {"model": ["/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "bir.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "state_dir": "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/sgLnk/sg00", "state_id": "sgLnk"} +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: BackendDriver in_state.num_states 3 with 1 core LNC +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: Executing /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/walrus_driver --optlevel 2 --allocator coloring --verbose 35 --logfile-verbose 20 --logfile /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/log-neuron-cc.txt -o walrus_bir.out.json --enable-call-graph --enable-mt-backend --link-subgraphs sg00,sg01,sg02 --link-dir sgLnk/sg00 --execute-repetition 1 -i bir.json --min_split_size 10240 --skip_split_vns '' --no_split_dram --split_huge_dram_tensor 1.0 --preprocessing_only --max_tensorizer_distance 64 --pack_same_shape_only --instruction_fetch_latency 511 --max-partitions 1 --policy 3 --auxflag 0 --interleave none --schedule-delayed-latency 1 --postsched-mm-accum-reorder=false --max-load-lower-bound 0.14 --force-prefetch-follow-incoming-order -1 --allreduce-buffer-size 500 --dram-page-size 512 --dram-rotation-size -1 --allreduce-rotation-dis 8 --repeat-load-thres 4 --enable-mm-transpose-remat-optimization=true --save-len-thres 512 --save-dma-cnt-thres 32 --relaxed-order=true --enable-anti-dependence-reduction=false --num-semaphores-per-queue 16 --numcores 1 --act-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/pwp/pwp_bin_trainium/act_info.json --dve-root-json /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json --enable-verifier=true --enable-birsim=false --enable-birsim-sync-only=false --enable-data-race-checker=false --enable-new-backend=true --inject-error=NONE --enable-internal-partitioner --dge-levels io,vector_dynamic_offsets,scalar_dynamic_offset --dynamic-dma-scratch-size-per-partition=16384 --neff-output-filename /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.neff +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: Working directory is /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6 +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: propagate_exit=True +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: use_logger=False +2025-08-07T13:54:02Z INFO 48502 [job.WalrusDriver.0]: expose_stderr=True +2025-08-07T13:54:02Z INFO 49414 [Logging]: Logging to ../log-neuron-cc.txt at level 'INFO' +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: max_allowed_parallelism=128 +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: Loading module from sg00/bir.json +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: Loading module from sg01/bir.json +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: Loading module from sg02/bir.json +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: Backend driver mtBackend: true numModules: 3 Cwd: "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6" +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: DynamicDMA is enabled +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: DynamicDMA levels being enabled: io, scalar_dynamic_offset, vector_dynamic_offsets, +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: Modular flow call graph is enabled +2025-08-07T13:54:02Z INFO 49414 [BackendDriver]: Internal partitioner is enabled +2025-08-07T13:54:02Z USER 49414 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:54:02Z INFO 49414 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=671 blocks=3 instructions=1038 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg00) [ModuleForkPass]: Running do_nothing +2025-08-07T13:54:02Z USER 49414 (sg02) [ModuleForkPass]: Running do_nothing +2025-08-07T13:54:02Z USER 49414 (sg01) [ModuleForkPass]: Running do_nothing +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=192 blocks=1 instructions=40 Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z USER 49414 (sg00) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 80mb, ru_maxrss: 200mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 192 memory location(s), 1 block(s), and 40 instruction(s). Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z USER 49414 (sg00) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=138 blocks=1 instructions=45 Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z USER 49414 (sg01) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=192 blocks=1 instructions=40 Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 80mb, ru_maxrss: 200mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 138 memory location(s), 1 block(s), and 45 instruction(s). Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z USER 49414 (sg01) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=138 blocks=1 instructions=45 Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to do_nothing: modules=1 functions=1 allocs=341 blocks=1 instructions=953 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg02) [ModuleForkPass]: do_nothing finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 80mb, ru_maxrss: 200mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 341 memory location(s), 1 block(s), and 953 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg02) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:02Z WARNING 49414 [birverifier::InstVisitor]: (sg00) Non - output memory location with no reader: {convert.270.1874}@SB<0,0>(1x2)#Internal DebugInfo: +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=341 blocks=1 instructions=953 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg00) [ModuleForkPass]: birverifier finished after 0.016 seconds +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 122mb, ru_maxrss: 200mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 192 memory location(s), 1 block(s), and 40 instruction(s). Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z USER 49414 (sg01) [ModuleForkPass]: birverifier finished after 0.049 seconds +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 208mb, ru_maxrss: 207mb (delta=7mb) +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 138 memory location(s), 1 block(s), and 45 instruction(s). Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z USER 49414 (sg02) [ModuleForkPass]: birverifier finished after 0.178 seconds +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 341mb, ru_maxrss: 341mb (delta=141mb) +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 341 memory location(s), 1 block(s), and 953 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:02Z USER 49414 [BackendPassManager]: mod_parallel_pass finished after 0.181 seconds +2025-08-07T13:54:02Z INFO 49414 [BackendPassManager]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=141mb) +2025-08-07T13:54:02Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 671 memory location(s), 3 block(s), and 1038 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:54:02Z INFO 49414 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=671 blocks=3 instructions=1038 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:54:02Z INFO 49414 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=192 blocks=1 instructions=40 Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z USER 49414 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg00) [SubgraphForkPass]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=0mb) +2025-08-07T13:54:02Z USER 49414 (sg01) [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:54:02Z USER 49414 (sg02) [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:54:02Z INFO 49414 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 192 memory location(s), 1 block(s), and 40 instruction(s). Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z INFO 49414 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=138 blocks=1 instructions=45 Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z USER 49414 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg01) [SubgraphForkPass]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=341 blocks=1 instructions=953 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 138 memory location(s), 1 block(s), and 45 instruction(s). Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z INFO 49414 (sg02) [SubgraphForkPass]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 341 memory location(s), 1 block(s), and 953 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:02Z USER 49414 [BackendPassManager]: subgraph_parallel_pass finished after 0.001 seconds +2025-08-07T13:54:02Z INFO 49414 [BackendPassManager]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 671 memory location(s), 3 block(s), and 1038 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:54:02Z INFO 49414 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=671 blocks=3 instructions=1038 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg00) [ModuleForkPass]: Running expand_replication +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=192 blocks=1 instructions=40 Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z INFO 49414 (sg00) [ExpandReplication]: Found 0 replicated matmults +2025-08-07T13:54:02Z USER 49414 (sg00) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 192 memory location(s), 1 block(s), and 40 instruction(s). Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z USER 49414 (sg00) [ModuleForkPass]: Running unroll +2025-08-07T13:54:02Z USER 49414 (sg01) [ModuleForkPass]: Running expand_replication +2025-08-07T13:54:02Z USER 49414 (sg02) [ModuleForkPass]: Running expand_replication +2025-08-07T13:54:02Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=192 blocks=1 instructions=40 Max writers: 12 Max Readers: 11 +2025-08-07T13:54:02Z INFO 49414 (sg00) [Unroll]: INFO (Unroll) Start unrolling at Thu Aug 7 13:54:02 2025 +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=138 blocks=1 instructions=45 Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z INFO 49414 (sg01) [ExpandReplication]: Found 0 replicated matmults +2025-08-07T13:54:02Z USER 49414 (sg01) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to expand_replication: modules=1 functions=1 allocs=341 blocks=1 instructions=953 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 138 memory location(s), 1 block(s), and 45 instruction(s). Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z USER 49414 (sg01) [ModuleForkPass]: Running unroll +2025-08-07T13:54:02Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=138 blocks=1 instructions=45 Max writers: 2 Max Readers: 9 +2025-08-07T13:54:02Z INFO 49414 (sg02) [ExpandReplication]: Found 0 replicated matmults +2025-08-07T13:54:02Z INFO 49414 (sg01) [Unroll]: INFO (Unroll) Start unrolling at Thu Aug 7 13:54:02 2025 +2025-08-07T13:54:02Z USER 49414 (sg02) [ModuleForkPass]: expand_replication finished after 0.000 seconds +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 333mb, ru_maxrss: 341mb (delta=0mb) +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 341 memory location(s), 1 block(s), and 953 instruction(s). Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z USER 49414 (sg02) [ModuleForkPass]: Running unroll +2025-08-07T13:54:02Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to unroll: modules=1 functions=1 allocs=341 blocks=1 instructions=953 Max writers: 191 Max Readers: 475 +2025-08-07T13:54:02Z INFO 49414 (sg02) [Unroll]: INFO (Unroll) Start unrolling at Thu Aug 7 13:54:02 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: INFO (Unroll) DONE unrolling Thu Aug 7 13:54:02 2025 + +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: sg0000 Instruction count after Unroll: +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Total count: 8384 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Matmult: 4368 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: TensorScalarPtr: 986 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: GenericCopy: 705 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: TensorTensor: 644 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: TensorReduce: 448 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Activation: 422 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Memset: 202 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Load: 199 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: TensorScalarAffineSelect: 192 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Save: 85 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: DMACopy: 82 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Reciprocal: 32 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Iota: 16 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: CollectiveCompute: 3 +2025-08-07T13:54:03Z INFO 49414 (sg00) [Unroll]: Unrolled DGE count with Dynamic AP: 80 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: unroll finished after 0.100 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 482mb, ru_maxrss: 482mb (delta=141mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4354 memory location(s), 1 block(s), and 8384 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: INFO (Unroll) DONE unrolling Thu Aug 7 13:54:02 2025 + +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: sg0001 Instruction count after Unroll: +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Total count: 20003 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Matmult: 14360 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Load: 2012 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: GenericCopy: 814 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: TensorScalarPtr: 780 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: TensorReduce: 576 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Activation: 564 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: TensorTensor: 448 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Select: 256 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Save: 81 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: DMACopy: 66 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Reciprocal: 32 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Memset: 12 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: CollectiveCompute: 2 +2025-08-07T13:54:03Z INFO 49414 (sg01) [Unroll]: Unrolled DGE count with Dynamic AP: 64 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: unroll finished after 0.240 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 591mb (delta=250mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4712 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: INFO (Unroll) DONE unrolling Thu Aug 7 13:54:02 2025 + +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: sg0002 Instruction count after Unroll: +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Total count: 60307 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Matmult: 48723 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: GenericCopy: 6298 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Load: 3075 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Save: 657 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: TensorTensor: 297 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: TensorScalarPtr: 279 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Activation: 231 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Max: 224 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: MaxIndex: 224 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: MatchReplace: 217 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Gather: 35 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Memset: 16 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: TensorReduce: 12 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: StreamShuffle: 4 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Select: 4 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Iota: 3 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: CollectiveCompute: 3 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Reciprocal: 3 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: DMACopy: 2 +2025-08-07T13:54:03Z INFO 49414 (sg02) [Unroll]: Unrolled DGE count with Dynamic AP: 1 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: unroll finished after 0.629 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 739mb, ru_maxrss: 739mb (delta=398mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11836 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: mod_parallel_pass finished after 0.645 seconds +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: curr_vmrss: 554mb, ru_maxrss: 739mb (delta=398mb) +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20902 memory location(s), 3 block(s), and 88694 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=20902 blocks=3 instructions=88694 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg00) [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:54:03Z USER 49414 (sg01) [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:54:03Z USER 49414 (sg02) [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:54:03Z INFO 49414 (sg00) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=4354 blocks=1 instructions=8384 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=4712 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg02) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=11836 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z INFO 49414 (sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg00) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z USER 49414 (sg00) [SubgraphForkPass]: dead_code_elim finished after 0.013 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [SubgraphForkPass]: curr_vmrss: 557mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg01) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg01) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg01) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z USER 49414 (sg01) [SubgraphForkPass]: dead_code_elim finished after 0.051 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [SubgraphForkPass]: curr_vmrss: 580mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg02) [DeadCodeElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg02) [DeadCodeElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg02) [DeadCodeElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z USER 49414 (sg02) [SubgraphForkPass]: dead_code_elim finished after 0.108 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [SubgraphForkPass]: curr_vmrss: 581mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: subgraph_parallel_pass finished after 0.111 seconds +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: curr_vmrss: 581mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20227 memory location(s), 3 block(s), and 88629 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=20227 blocks=3 instructions=88629 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: birverifier finished after 0.009 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 581mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: birverifier finished after 0.020 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 581mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: birverifier finished after 0.062 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: mod_parallel_pass finished after 0.065 seconds +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20227 memory location(s), 3 block(s), and 88629 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=20227 blocks=3 instructions=88629 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg00) [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:54:03Z INFO 49414 (sg00) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [SubgraphForkPass]: lnc_verifier finished after 0.000 seconds +2025-08-07T13:54:03Z USER 49414 (sg01) [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:54:03Z INFO 49414 (sg00) [SubgraphForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z USER 49414 (sg02) [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:54:03Z INFO 49414 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [SubgraphForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg02) [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg02) [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [SubgraphForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: subgraph_parallel_pass finished after 0.004 seconds +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20227 memory location(s), 3 block(s), and 88629 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:54:03Z INFO 49414 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=20227 blocks=3 instructions=88629 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running instruction_reorder +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running instruction_reorder +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running instruction_reorder +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to instruction_reorder: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: instruction_reorder finished after 0.002 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running psum_legalization +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: psum_legalization finished after 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running legalize_cce_dma +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: legalize_cce_dma finished after 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running pre_opts +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreOpts]: Skipped. No pre-opt passes enabled +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running error_injector +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z WARNING 49414 (sg00) [ErrorInjector]: Unrecognized injected error value "0" +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running vn_splitter +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: instruction_reorder finished after 0.006 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running psum_legalization +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: psum_legalization finished after 0.004 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running legalize_cce_dma +2025-08-07T13:54:03Z INFO 49414 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-08-07T13:54:03Z INFO 49414 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.003 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.001 seconds +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: vn_splitter finished after 0.007 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3808 memory location(s), 1 block(s), and 8319 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running constant_propagate +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=3808 blocks=1 instructions=8319 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: legalize_cce_dma finished after 0.003 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running pre_opts +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg01) [PreOpts]: Skipped. No pre-opt passes enabled +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: pre_opts finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running error_injector +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z WARNING 49414 (sg01) [ErrorInjector]: Unrecognized injected error value "0" +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: error_injector finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running vn_splitter +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg01) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 1 +2025-08-07T13:54:03Z INFO 49414 (sg01) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: instruction_reorder finished after 0.019 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 64 +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running psum_legalization +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to psum_legalization: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: constant_propagate finished after 0.015 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running lower_ac +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: lower_ac finished after 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running input_dma_coalescing +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: input_dma_coalescing finished after 0.002 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running remat_optimization +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [RematOpt]: Removed 0 remat instructions +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: psum_legalization finished after 0.012 seconds +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: remat_optimization finished after 0.004 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running early_peephole_opts +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running legalize_cce_dma +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to legalize_cce_dma: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z INFO 49414 (sg00) [EarlyPeepholeOpts]: Activation Accumulate: 192 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: early_peephole_opts finished after 0.003 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running infer_stream_ids +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: infer_stream_ids finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running pre_sched +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: Start PRE scheduling 2 cores: 1 at: Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 [LayerSpiller]: LayerSpill: Start... +2025-08-07T13:54:03Z INFO 49414 [LayerSpiller]: LayerSpill: Found 1 Splits CCs +2025-08-07T13:54:03Z INFO 49414 [LayerSpiller]: Grouped CCs to 1 clusters. +2025-08-07T13:54:03Z INFO 49414 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-08-07T13:54:03Z INFO 49414 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-08-07T13:54:03Z INFO 49414 [LayerSpiller]: LayerSpill: Done. +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: Start split live ranges Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: No split opportunities: +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: End split live ranges Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: Strt remove redundncies Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_redundant_memsets +2025-08-07T13:54:03Z INFO 49414 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-08-07T13:54:03Z INFO 49414 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-08-07T13:54:03Z INFO 49414 (sg01) [VNSplitterPass]: INFO (VNSplitter) Time: 0 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.008 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.009 seconds +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: vn_splitter finished after 0.026 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running constant_propagate +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: legalize_cce_dma finished after 0.012 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 582mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running pre_opts +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_redundant_memsets: 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_redundant_loads +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to pre_opts: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z INFO 49414 (sg02) [PreOpts]: Skipped. No pre-opt passes enabled +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: pre_opts finished after 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 583mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running error_injector +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_redundant_loads: 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: End remove redundncies Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: Start DCE Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to error_injector: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z WARNING 49414 (sg02) [ErrorInjector]: Unrecognized injected error value "0" +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: error_injector finished after 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 583mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running vn_splitter +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to vn_splitter: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z INFO 49414 (sg02) [VNSplitter]: INFO (VNSplitter) Collected all the internal vnodes: size = 14 +2025-08-07T13:54:03Z INFO 49414 (sg02) [VNSplitter]: INFO (VNSplitter) Done with analyze and splitting: total dead nodes = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: End DCE Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: Start build flow dependencies Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [build_flow_deps]: Start build fdeps. Invocation: 1Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [build_flow_deps]: Allocs: 3744 instructions: 8255 +2025-08-07T13:54:03Z INFO 49414 (sg00) [build_flow_deps]: Build fdeps inserted 20578 edges +2025-08-07T13:54:03Z INFO 49414 (sg00) [build_flow_deps]: Done build fdeps 20578 Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: End build flow dependencies Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: Start remove useless insts Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove_useless_insts +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: remove Useless Instructions: 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: End remove useless insts Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: Start scratchpad optimization Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: End scratchpad optimization Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PreSched]: DONE PRE scheduling Thu Aug 7 13:54:03 2025 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: pre_sched finished after 0.046 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [TensorCopyElim]: Tensor CP elimination: 0 +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg00) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.008 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3744 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running dynamic_dma_setup +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=3744 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3745 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running runtime_memory_reservation +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=3745 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3745 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running coloring_allocator_psum +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=3745 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: allocating PSUM +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: main loop +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: renumber locations +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: size = 1097 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: build_no_bitmap start +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: found 2324 edges +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: mean: 4.23701 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: median: 3.46069 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: adjacency vectors require 18592 bytes +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: build_no_bitmap done +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: find costs +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: simplify interference graph +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: initialize low and high +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: lo = 1097 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: hi = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: inf = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: total = 1097 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: simplify +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: new candidates = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: select ranges +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: no more spills +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-08-07T13:54:03Z INFO 49414 (sg00) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: coloring_allocator_psum finished after 0.016 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3745 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running dma_optimization_psum +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=3745 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg02) [ShrinkDN]: INFO (ShrinkDN): Shrunk 2 nodes. Total savings 14336 bytes/partition +2025-08-07T13:54:03Z INFO 49414 (sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-08-07T13:54:03Z INFO 49414 (sg00) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: dma_optimization_psum finished after 0.003 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3745 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running address_rotation_psum +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=3745 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 240 PSUM Banks +2025-08-07T13:54:03Z INFO 49414 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 40 PSUM Banks +2025-08-07T13:54:03Z INFO 49414 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 334 PSUM Banks +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: address_rotation_psum finished after 0.014 seconds +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3745 memory location(s), 1 block(s), and 8255 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z USER 49414 (sg00) [ModuleForkPass]: Running coloring_allocator_sb +2025-08-07T13:54:03Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=3745 blocks=1 instructions=8255 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 75576064 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2996 bytes +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 22544386 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2096 bytes +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 6336512 +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 307 bytes +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:03Z INFO 49414 (sg00) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: allocating SB +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: main loop +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: renumber locations +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: size = 2617 +2025-08-07T13:54:03Z INFO 49414 [PerformanceProfiler]: number of tensorizer non-local-tensor caused reload left 0 +2025-08-07T13:54:03Z INFO 49414 [PerformanceProfiler]: number of tensorizer non-local-tensor caused spill left 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: find partners +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: found 755 accumulation groups +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: largest = custom-call.226.1608_i1 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: tensors = 33 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: requires 66048 bytes/partition +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: expanding partners +2025-08-07T13:54:03Z INFO 49414 (sg02) [VNSplitterPass]: INFO (VNSplitter) Time: 0.001 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [VNSplitterPass]: INFO (VerticalFusion) Time: 0.027 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [VNSplitterPass]: INFO (ShrinkDN) Time: 0.031 seconds +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: vn_splitter finished after 0.084 seconds +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 584mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z USER 49414 (sg02) [ModuleForkPass]: Running constant_propagate +2025-08-07T13:54:03Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to constant_propagate: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:03Z INFO 49414 []: find first defs for local +2025-08-07T13:54:03Z INFO 49414 []: find first defs for global +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: find loads +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: 1 pin count +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: 133 remat count +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: build interference graph +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: pass 1 int-tree +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Num intervals 2617 Num locations 2617 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: IntervalTree Build Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: info.neighbors init Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: info.neighbors partners Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: IntervalTree readback Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: edge: 256786 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: mean: 196.245 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: median: 131.431 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: find costs +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: simplify interference graph +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: initialize safe & unsafe +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: safe = 480 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: unsafe = 1849 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: inf = 287 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: total = 2616 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: simplify +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 1849 #Pinned 0 #Safe 0 minCost 0.000374499 maxCost 0.0895006 locations 2617 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: new candidates = 201 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: select ranges +2025-08-07T13:54:03Z INFO 49414 (sg02) [ConstantPropagate]: [Constant_propagate for select] directly remove instruction number: 0 +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Total: 2616 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Spilled: 0.021 (54) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Allocated: 0.979 (2562) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Rover zone: 0.194 (497) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Pre-rover zone: 0.007 (17) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Post-rover zone: 0.799 (2048) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Slice zone: 0.000 (0) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Blocks nothing: 0.000 (0) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Blocks tall: 1.000 (2562) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Visited until tall blocking (mean): 0.999 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Success +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: SB spills = 54 tensors +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: size = 82432 bytes/partition +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: remats = 1 tensors +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: unpinned = 0 tensors +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: SB score = 992889 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: best SB heuristic = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: collect spills +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z INFO 49414 (sg01) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: constant_propagate finished after 0.132 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 588mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running lower_ac +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: insert spills +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg01) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: lower_ac finished after 0.004 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 588mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running input_dma_coalescing +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: deleting loads #loadsToDelete: 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: deleting locs #locationsToDelete: 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: locationsToDelete done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: main loop +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: renumber locations +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: size = 3047 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: find partners +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: found 755 accumulation groups +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: largest = custom-call.226.1608_i1 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: tensors = 33 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: requires 66048 bytes/partition +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: expanding partners +2025-08-07T13:54:03Z INFO 49414 (sg01) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: input_dma_coalescing finished after 0.011 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 589mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 []: find first defs for local +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running remat_optimization +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 []: find first defs for global +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: find loads +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: 1 pin count +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: 531 remat count +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: build interference graph +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: pass 1 int-tree +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Num intervals 3047 Num locations 3047 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: IntervalTree Build Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: info.neighbors init Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: info.neighbors partners Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: IntervalTree readback Done +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: edge: 203902 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: mean: 133.838 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: median: 85.5276 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: find costs +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: simplify interference graph +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: initialize safe & unsafe +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: safe = 72 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: unsafe = 12 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: inf = 400 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: total = 484 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: simplify +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 6 #Pinned 0 #Safe 0 minCost 0.017119 maxCost 0.017119 locations 3047 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: new candidates = 2 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: (including 355 infinite cost tensors) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: select ranges +2025-08-07T13:54:03Z INFO 49414 (sg01) [RematOpt]: Removed 0 remat instructions +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: remat_optimization finished after 0.014 seconds +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 589mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z USER 49414 (sg01) [ModuleForkPass]: Running early_peephole_opts +2025-08-07T13:54:03Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:03Z INFO 49414 (sg01) [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Total: 484 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Spilled: 0.000 (0) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Allocated: 1.000 (484) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Rover zone: 0.593 (287) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Pre-rover zone: 0.021 (10) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Post-rover zone: 0.386 (187) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Slice zone: 0.000 (0) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Blocks nothing: 0.000 (0) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Blocks tall: 1.000 (484) +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Visited until tall blocking (mean): 1.000 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-08-07T13:54:03Z INFO 49414 (sg00) [SB_Allocator]: Success +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: SB spills = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: remats = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: unpinned = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: SB score = 0 +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: spilling from SB cost about 992889 cycles +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: number of tensors spilled from SB = 54 +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: total size of spilled tensors = 82432 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: pinning saved approximately 9010 cycles +2025-08-07T13:54:04Z INFO 49414 (sg00) [SB_Allocator]: 0% SB utilization after allocation +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 145240832 +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 1809 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 40960002 +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 1893 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 6336512 +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 307 bytes +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: coloring_allocator_sb finished after 0.084 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 589mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4228 memory location(s), 1 block(s), and 8770 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=4228 blocks=1 instructions=8770 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg01) [EarlyPeepholeOpts]: Activation Accumulate: 256 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: address_rotation_sb finished after 0.006 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 589mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4228 memory location(s), 1 block(s), and 8770 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running dma_optimization_sb +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=4228 blocks=1 instructions=8770 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 186200834, 31.5782% input load, 5.34985% output write, 63.072% spill/reload [sg0000] +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: early_peephole_opts finished after 0.016 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 589mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: removed 0 identical load +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.003 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 589mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running infer_stream_ids +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: infer_stream_ids finished after 0.002 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 589mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running pre_sched +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Start PRE scheduling 2 cores: 1 at: Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: Start... +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 1572864, 0.844714% out of total dma traffic(5.87988e+07) +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: Found 2 Splits CCs +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: Grouped CCs to 2 clusters. +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: Done. +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Start split live ranges Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 237 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 232 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 1]: removed 20 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 1]: removed 20 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Num_Splits: 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: End split live ranges Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Strt remove redundncies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_redundant_memsets +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 2]: removed 4 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 2]: removed 4 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_redundant_memsets: 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_redundant_loads +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 3]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 3]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 38928384, 33.1473% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_redundant_loads: 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: End remove redundncies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Start DCE Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 2 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [remove_memset_spill]: removed 1 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 52 SpillSaves and Reloads +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: average loaded DMA size 2508 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: average saved DMA size 2061 bytes +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 1 combined 4 SpillSaves and Reloads +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: average loaded DMA size 2532 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: average saved DMA size 2075 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 105984768 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2532 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 39583746 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2075 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 3637248, 3.0971% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 44138496, 23.7048% out of total dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 142062338, 41.3895% input load, 7.01204% output write, 51.5985% spill/reload [sg0000] +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 104166144 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2488 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 37896194 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 1986 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 6336512 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 307 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1818 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: dma_optimization_sb finished after 0.079 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 590mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 5 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 90 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 80 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 31 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: End DCE Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: [Constant_propagate for Affineselect] directly remove instruction number: 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Start build flow dependencies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [build_flow_deps]: Start build fdeps. Invocation: 2Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [build_flow_deps]: Allocs: 4625 instructions: 20003 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 680 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: address_rotation_sb finished after 0.057 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 590mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running coloring_allocator_dram +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:04Z INFO 49414 (sg00) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: reserved space = 686899974 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: spill space = 39321600 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: aligned spill space = 39321600 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: renumber locations +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: size = 44 +2025-08-07T13:54:04Z INFO 49414 []: find first defs for local +2025-08-07T13:54:04Z INFO 49414 []: find first defs for global +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: Num intervals 44 Num locations 44 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: IntervalTree Build Done +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: info.neighbors init Done +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: IntervalTree readback Done +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: simplify interference graph +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: initialize low and high +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: lo = 44 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: hi = 0 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: total = 44 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: simplify +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: new candidates = 0 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: select ranges +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: allreduce_dram_hwm 29360128 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: Real CC buffer size 29360128 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: DRAM hwm after allocation: 39321600 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DRAM_Allocator]: DRAM allocation successful +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: coloring_allocator_dram finished after 0.008 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running address_rotation_dram +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: Runtime page size at 512MB +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DRAM hwm before rotation 39321600 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: allreduce hwm 29360128 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: Real CC buffer size 29360128 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DRAM hwm after rotation 39321600 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: DRAM Rotation rotated 0 Dram address +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: address_rotation_dram finished after 0.004 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running tensorcopy_accel +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-08-07T13:54:04Z INFO 49414 (sg00) [TensorCopyAccel::Impl]: Accelerated 0 out of 843 tensorcopy in Function: sg0000 average acceleration factor: -nan +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: tensorcopy_accel finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running peephole_opts +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: peephole_opts finished after 0.008 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running lower_kernel +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [LowerKernel]: Started running LowerKernel +2025-08-07T13:54:04Z INFO 49414 (sg00) [LowerKernel]: Start of kernel lowering pass, number of insts: 8451, number of allocs: 3899 +2025-08-07T13:54:04Z INFO 49414 (sg00) [LowerKernel]: Scan BKs time (s): 0.062925 +2025-08-07T13:54:04Z INFO 49414 (sg00) [LowerKernel]: Lower BKs time (s): 0.000264 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: lower_kernel finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running lower_nki_kernel +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: lower_nki_kernel finished after 0.000 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: birverifier finished after 0.006 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running dynamic_dma_scan +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: dynamic_dma_scan finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 591mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running build_fdeps +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Start build fdeps. Invocation: 3Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Allocs: 3899 instructions: 8451 +2025-08-07T13:54:04Z INFO 49414 (sg01) [build_flow_deps]: Build fdeps inserted 61350 edges +2025-08-07T13:54:04Z INFO 49414 (sg01) [build_flow_deps]: Done build fdeps 61350 Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: End build flow dependencies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Start remove useless insts Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove_useless_insts +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: remove Useless Instructions: 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: End remove useless insts Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: Start scratchpad optimization Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: End scratchpad optimization Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Build fdeps inserted 20910 edges +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Done build fdeps 20910 Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: build_fdeps finished after 0.014 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running remove_redundancies +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [RemoveRedundancies]: remove_clobbered_writes +2025-08-07T13:54:04Z INFO 49414 (sg00) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-08-07T13:54:04Z INFO 49414 (sg00) [RemoveRedundancies]: remove_useless_insts +2025-08-07T13:54:04Z INFO 49414 (sg00) [RemoveRedundancies]: remove Useless Instructions: 0 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: remove_redundancies finished after 0.002 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PreSched]: DONE PRE scheduling Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: pre_sched finished after 0.191 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [TensorCopyElim]: Tensor CP elimination: 0 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.031 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [TensorCopyElim]: Tensor CP elimination: 0 +2025-08-07T13:54:04Z INFO 49414 (sg00) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: tensor_copy_elim finished after 0.010 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running post_sched +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 [post_scheduler]: Start PosT ScheD 3 sunda Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.036 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4625 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running dynamic_dma_setup +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=4625 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4626 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running runtime_memory_reservation +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=4626 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4626 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running coloring_allocator_psum +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=4626 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: allocating PSUM +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: main loop +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: renumber locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: size = 1228 +2025-08-07T13:54:04Z INFO 49414 (sg02) [ConstantPropagate]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: constant_propagate finished after 0.333 seconds +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: Running lower_ac +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to lower_ac: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: build_no_bitmap start +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: found 3574 edges +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: mean: 5.82085 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: median: 6.99819 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: adjacency vectors require 28592 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: build_no_bitmap done +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: find costs +2025-08-07T13:54:04Z INFO 49414 (sg02) [LowerAC]: INFO (LowerAC) Lowered 0 loads, 0 saves, 0 copies. +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: lower_ac finished after 0.013 seconds +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 594mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: Running input_dma_coalescing +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to input_dma_coalescing: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z INFO 49414 [post_scheduler]: Time-aware hwm post-sched +2025-08-07T13:54:04Z INFO 49414 (sg02) [DMAOptimizationBase]: DMA input Coalescing combined 0 input loads +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: input_dma_coalescing finished after 0.026 seconds +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 595mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: Running remat_optimization +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to remat_optimization: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: simplify interference graph +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: initialize low and high +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: lo = 1228 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: hi = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: inf = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: total = 1228 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: simplify +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: new candidates = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: select ranges +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: no more spills +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-08-07T13:54:04Z INFO 49414 (sg01) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: coloring_allocator_psum finished after 0.067 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 596mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4626 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running dma_optimization_psum +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=4626 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: dma_optimization_psum finished after 0.010 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 596mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4626 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running address_rotation_psum +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=4626 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 32 PSUM Banks +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 16 PSUM Banks +2025-08-07T13:54:04Z INFO 49414 [post_scheduler]: Time-aware simulation time: 2192337 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 2 PSUM Banks +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: address_rotation_psum finished after 0.040 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 598mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4626 memory location(s), 1 block(s), and 20003 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running coloring_allocator_sb +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=4626 blocks=1 instructions=20003 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 [post_scheduler]: Done PosT ScheD Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: post_sched finished after 0.130 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 598mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running expand_scheduling_units +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 279003648 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 1083 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 25165826 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2457 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2129920 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: expand_scheduling_units finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 598mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: allocating SB +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: main loop +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: renumber locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: size = 3365 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: find partners +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: found 1176 accumulation groups +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: largest = _dot.6-t1042_i41 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: tensors = 96 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: requires 98304 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: expanding partners +2025-08-07T13:54:04Z INFO 49414 (sg02) [RematOpt]: Removed 0 remat instructions +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: remat_optimization finished after 0.099 seconds +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 603mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: Running early_peephole_opts +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to early_peephole_opts: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z INFO 49414 (sg02) [EarlyPeepholeOpts]: PeepholeOpts enabled? ActivationAccumulate: true +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 385 PSUM Banks +2025-08-07T13:54:04Z INFO 49414 []: find first defs for local +2025-08-07T13:54:04Z INFO 49414 []: find first defs for global +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 368 PSUM Banks +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: find loads +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: PSUM Rotation rotated 201 PSUM Banks +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 1 pin count +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 410 remat count +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: build interference graph +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: pass 1 int-tree +2025-08-07T13:54:04Z INFO 49414 (sg02) [EarlyPeepholeOpts]: Activation Accumulate: 0 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 11 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Num intervals 3365 Num locations 3365 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: IntervalTree Build Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: info.neighbors init Done +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: early_peephole_opts finished after 0.029 seconds +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 606mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: Running coalesce_multichannel_cc_ops +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to coalesce_multichannel_cc_ops: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 10 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: info.neighbors partners Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: IntervalTree readback Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: edge: 369197 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: mean: 219.434 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: median: 142.66 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: find costs +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: coalesce_multichannel_cc_ops finished after 0.007 seconds +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 606mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: Running infer_stream_ids +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to infer_stream_ids: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 93 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: simplify interference graph +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: initialize safe & unsafe +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: infer_stream_ids finished after 0.007 seconds +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 606mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60307 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z USER 49414 (sg02) [ModuleForkPass]: Running pre_sched +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: safe = 442 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: unsafe = 2024 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: inf = 898 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: total = 3364 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: simplify +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 2013 #Pinned 0 #Safe 0 minCost 0.000602867 maxCost 0.0489795 locations 3365 +2025-08-07T13:54:04Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to pre_sched: modules=1 functions=1 allocs=11794 blocks=1 instructions=60307 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Start PRE scheduling 2 cores: 1 at: Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: Start... +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: new candidates = 600 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: select ranges +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 59 Sb address +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: Found 1 Splits CCs +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: Grouped CCs to 1 clusters. +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 381 Sb address +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: To Spill 0 multi-layer tensors +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: set uninit flag on 0 insts +2025-08-07T13:54:04Z INFO 49414 [LayerSpiller]: LayerSpill: Done. +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Start split live ranges Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Total: 3364 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Spilled: 0.043 (144) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Allocated: 0.957 (3220) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Rover zone: 0.344 (1107) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Pre-rover zone: 0.005 (16) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Post-rover zone: 0.651 (2097) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Slice zone: 0.000 (0) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Blocks nothing: 0.000 (0) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Blocks tall: 1.000 (3220) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Visited until tall blocking (mean): 1.000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Success +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: SB spills = 144 tensors +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: size = 196608 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: remats = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: unpinned = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: SB score = 1.52973e+06 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: best SB heuristic = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: collect spills +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg00) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: address_rotation_sb finished after 0.117 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 608mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: insert spills +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: deleting loads #loadsToDelete: 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: deleting locs #locationsToDelete: 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: locationsToDelete done +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Num_Splits: 0 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: End split live ranges Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Strt remove redundncies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_redundant_memsets +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: main loop +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: renumber locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: size = 3899 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: find partners +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.033 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 613mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-08-07T13:54:04Z INFO 49414 (sg00) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: found 1176 accumulation groups +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: largest = _dot.6-t1042_i41 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: tensors = 96 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: requires 98304 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: expanding partners +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: anti_dependency_analyzer finished after 0.005 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 613mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running dep_opt +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Start build fdeps. Invocation: 4Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Allocs: 3899 instructions: 8451 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_redundant_memsets: 2 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_redundant_loads +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Build fdeps inserted 20242 edges +2025-08-07T13:54:04Z INFO 49414 (sg00) [build_flow_deps]: Done build fdeps 20242 Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: dep_opt finished after 0.027 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 613mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: Running report_stats +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [ReportStats]: Data Movement Statistics: sg0000 +┌──────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├──────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 17 │ 9957281792 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 67108864 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ Load │ Const -> Internal │ 3 │ 65792 │ +│ Load │ ExternalInput -> Internal │ 148 │ 58733056 │ +�� Load │ Internal │ 178 │ 45367296 │ +│ Save │ Internal │ 62 │ 16252928 │ +│ Save │ Internal -> Output │ 37 │ 9961474 │ +│ Save (Spill) │ Internal │ 51 │ 11681792 │ +└──────────────┴────────────────────────────┴───────┴────────────┘ + +2025-08-07T13:54:04Z INFO 49414 (sg00) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 3 │ +│ 64 │ 1 │ +│ 256 │ 3 │ +│ 512 │ 1 │ +│ 896 │ 6 │ +│ 1024 │ 46 │ +│ 1920 │ 32 │ +│ 2048 │ 288 │ +│ 4096 │ 116 │ +│ 262144 │ 64 │ +│ 8388608 │ 2 │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:04Z INFO 49414 (sg00) [ReportStats]: MM Stats: #MatMults 4368 #MatMult-Transposes 1312 +2025-08-07T13:54:04Z INFO 49414 (sg00) [ReportStats]: IO Tensor size combined: 668484100 +2025-08-07T13:54:04Z INFO 49414 (sg00) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input76 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input77 │ ExternalInput │ bfloat16 │ 16777216 │ +│ input83 │ ExternalInput │ bfloat16 │ 16777216 │ +│ input78 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input81 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input5 │ ExternalInput │ bfloat16 │ 1048576 │ +│ input4 │ ExternalInput │ bfloat16 │ 1048576 │ +│ output1 │ ExternalOutput │ bfloat16 │ 1048576 │ +│ output2 │ ExternalOutput │ bfloat16 │ 1048576 │ +│ input79 │ ExternalInput │ bfloat16 │ 8192 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-08-07T13:54:04Z INFO 49414 (sg00) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate1 │ Output │ bfloat16 │ 8388608 │ +│ intermediate4-buffer-2240 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate4 │ Output │ bfloat16 │ 8388608 │ +│ dot.4-buffer-2238 │ Internal │ bfloat16 │ 8388608 │ +│ all_gather.1_i0 │ Internal │ bfloat16 │ 4194304 │ +│ all_gather.1_i1 │ Internal │ bfloat16 │ 4194304 │ +│ transpose.1_i1 │ Internal │ bfloat16 │ 2097152 │ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +│ transpose.1_i0 │ Internal │ bfloat16 │ 2097152 │ +│ intermediate0 │ Output │ uint8 │ 1048576 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-08-07T13:54:04Z USER 49414 (sg00) [ModuleForkPass]: report_stats finished after 0.002 seconds +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 613mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:04Z INFO 49414 []: find first defs for local +2025-08-07T13:54:04Z INFO 49414 []: find first defs for global +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_redundant_loads: 0 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: End remove redundncies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Start DCE Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: find loads +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 1 pin count +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 880 remat count +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: build interference graph +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: pass 1 int-tree +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Num intervals 3899 Num locations 3899 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: IntervalTree Build Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: info.neighbors init Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: info.neighbors partners Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: IntervalTree readback Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: edge: 309665 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: mean: 158.843 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: median: 113.474 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: find costs +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: simplify interference graph +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: initialize safe & unsafe +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: safe = 257 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: unsafe = 12 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: inf = 409 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: total = 678 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: simplify +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 10 #Pinned 0 #Safe 0 minCost 0.012868 maxCost 0.0320175 locations 3899 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: new candidates = 9 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: (including 409 infinite cost tensors) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: select ranges +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Total: 678 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Spilled: 0.000 (0) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Allocated: 1.000 (678) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Rover zone: 0.618 (419) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Pre-rover zone: 0.013 (9) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Post-rover zone: 0.369 (250) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Slice zone: 0.000 (0) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Blocks nothing: 0.000 (0) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Blocks tall: 1.000 (678) +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Visited until tall blocking (mean): 1.000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: Success +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: SB spills = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: remats = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: unpinned = 0 tensors +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: SB score = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: spilling from SB cost about 1.52973e+06 cycles +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: number of tensors spilled from SB = 144 +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: total size of spilled tensors = 196608 bytes/partition +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: pinning saved approximately 9010 cycles +2025-08-07T13:54:04Z INFO 49414 (sg01) [SB_Allocator]: 0% SB utilization after allocation +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 380453376 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 1167 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 77594626 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2104 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 2129920 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 130 bytes +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: coloring_allocator_sb finished after 0.249 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 613mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 5304 memory location(s), 1 block(s), and 20745 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=5304 blocks=1 instructions=20745 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: address_rotation_sb finished after 0.014 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 613mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 5304 memory location(s), 1 block(s), and 20745 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running dma_optimization_sb +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=5304 blocks=1 instructions=20745 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 458048002, 57.2487% input load, 1.83138% output write, 40.9199% spill/reload [sg0001] +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: removed 0 identical load +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: sub-graph will get execute 35 times +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 0, 0% out of total dma traffic(2.62226e+08) +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 356 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 312 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: End DCE Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 40 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 1]: removed 34 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 93585408, 49.9301% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Start build flow dependencies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [build_flow_deps]: Start build fdeps. Invocation: 5Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [remove_memset_spill]: removed 0 spill/reload memory locations +2025-08-07T13:54:04Z INFO 49414 (sg02) [build_flow_deps]: Allocs: 11794 instructions: 60305 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 228 SpillSaves and Reloads +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: average loaded DMA size 1133 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: average saved DMA size 2302 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 1 combined 88 SpillSaves and Reloads +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: average loaded DMA size 1144 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: average saved DMA size 2621 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 2 combined 0 SpillSaves and Reloads +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: average loaded DMA size 1144 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: average saved DMA size 2621 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 311116288 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 1144 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 53346306 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 2621 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 1441792, 0.769231% out of total spill/reload dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 95027200, 20.7461% out of total dma traffic +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 363020802, 72.2345% input load, 2.31078% output write, 25.4547% spill/reload [sg0001] +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 310395392 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 1145 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 52625410 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 2678 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 2129920 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 130 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 1188 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: dma_optimization_sb finished after 0.210 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 616mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20193 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=4722 blocks=1 instructions=20193 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 35 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 214 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 53 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 26 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg02) [build_flow_deps]: Build fdeps inserted 207193 edges +2025-08-07T13:54:04Z INFO 49414 (sg02) [build_flow_deps]: Done build fdeps 207193 Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: End build flow dependencies Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Start remove useless insts Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove_useless_insts +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: remove Useless Instructions: 0 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: End remove useless insts Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: Start scratchpad optimization Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 372 Sb address +2025-08-07T13:54:04Z INFO 49414 (sg02) [PreSched]: End scratchpad optimization Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: address_rotation_sb finished after 0.100 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 619mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20193 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running coloring_allocator_dram +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=4722 blocks=1 instructions=20193 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:04Z INFO 49414 (sg01) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: reserved space = 232342024 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: spill space = 67108864 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: aligned spill space = 67108864 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: renumber locations +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: size = 70 +2025-08-07T13:54:04Z INFO 49414 []: find first defs for local +2025-08-07T13:54:04Z INFO 49414 []: find first defs for global +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: Num intervals 70 Num locations 70 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: IntervalTree Build Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: info.neighbors init Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: IntervalTree readback Done +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: simplify interference graph +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: initialize low and high +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: lo = 70 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: hi = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: total = 70 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: simplify +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: new candidates = 0 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: select ranges +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: allreduce_dram_hwm 33554432 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: Real CC buffer size 33554432 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: DRAM hwm after allocation: 55443456 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DRAM_Allocator]: DRAM allocation successful +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: coloring_allocator_dram finished after 0.020 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20193 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running address_rotation_dram +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=4722 blocks=1 instructions=20193 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: Runtime page size at 512MB +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DRAM hwm before rotation 55443456 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: allreduce hwm 33554432 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: Real CC buffer size 33554432 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DRAM hwm after rotation 55443456 +2025-08-07T13:54:04Z INFO 49414 (sg01) [DMAOptimizationBase]: DRAM Rotation rotated 4 Dram address +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: address_rotation_dram finished after 0.008 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20193 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running tensorcopy_accel +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=4722 blocks=1 instructions=20193 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-08-07T13:54:04Z INFO 49414 (sg01) [TensorCopyAccel::Impl]: Accelerated 32 out of 826 tensorcopy in Function: sg0001 average acceleration factor: 1 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: tensorcopy_accel finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20193 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running peephole_opts +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=4722 blocks=1 instructions=20193 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: peephole_opts finished after 0.008 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20449 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running lower_kernel +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=4722 blocks=1 instructions=20449 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [LowerKernel]: Started running LowerKernel +2025-08-07T13:54:04Z INFO 49414 (sg01) [LowerKernel]: Start of kernel lowering pass, number of insts: 20449, number of allocs: 4722 +2025-08-07T13:54:04Z INFO 49414 (sg01) [LowerKernel]: Scan BKs time (s): 0.001711 +2025-08-07T13:54:04Z INFO 49414 (sg01) [LowerKernel]: Lower BKs time (s): 4e-06 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: lower_kernel finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20449 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running lower_nki_kernel +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=4722 blocks=1 instructions=20449 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: lower_nki_kernel finished after 0.001 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20449 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=4722 blocks=1 instructions=20449 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.002 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20449 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=4722 blocks=1 instructions=20449 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: birverifier finished after 0.012 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20449 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running dynamic_dma_scan +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=4722 blocks=1 instructions=20449 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: dynamic_dma_scan finished after 0.002 seconds +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20449 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z USER 49414 (sg01) [ModuleForkPass]: Running build_fdeps +2025-08-07T13:54:04Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=4722 blocks=1 instructions=20449 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:04Z INFO 49414 (sg01) [build_flow_deps]: Start build fdeps. Invocation: 6Thu Aug 7 13:54:04 2025 +2025-08-07T13:54:04Z INFO 49414 (sg01) [build_flow_deps]: Allocs: 4722 instructions: 20449 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PreSched]: DONE PRE scheduling Thu Aug 7 13:54:05 2025 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: pre_sched finished after 0.556 seconds +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60305 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=11794 blocks=1 instructions=60305 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z INFO 49414 (sg02) [TensorCopyElim]: Tensor CP elimination: 1 +2025-08-07T13:54:05Z INFO 49414 (sg01) [build_flow_deps]: Build fdeps inserted 62004 edges +2025-08-07T13:54:05Z INFO 49414 (sg01) [build_flow_deps]: Done build fdeps 62004 Thu Aug 7 13:54:05 2025 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: build_fdeps finished after 0.051 seconds +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20449 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: Running remove_redundancies +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=4722 blocks=1 instructions=20449 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z INFO 49414 (sg01) [RemoveRedundancies]: remove_clobbered_writes +2025-08-07T13:54:05Z INFO 49414 (sg01) [RemoveRedundancies]: remove_clobbered_writes: 11 +2025-08-07T13:54:05Z INFO 49414 (sg01) [RemoveRedundancies]: remove_useless_insts +2025-08-07T13:54:05Z INFO 49414 (sg01) [RemoveRedundancies]: remove Useless Instructions: 0 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: remove_redundancies finished after 0.008 seconds +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:05Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:54:05Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:05Z INFO 49414 (sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:05Z INFO 49414 (sg02) [TensorCopyElim]: remove_must_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:05Z INFO 49414 (sg02) [TensorCopyElim]: remove_redundant_alias_dmacopy removed 0 DMAcopys +2025-08-07T13:54:05Z INFO 49414 (sg02) [TensorCopyElim]: remove_redundant_internal2internal_dmacopy removed 0 DMAcopys +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.126 seconds +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11793 memory location(s), 1 block(s), and 60304 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: Running dynamic_dma_setup +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to dynamic_dma_setup: modules=1 functions=1 allocs=11793 blocks=1 instructions=60304 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: dynamic_dma_setup finished after 0.000 seconds +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60304 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: Running runtime_memory_reservation +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to runtime_memory_reservation: modules=1 functions=1 allocs=11794 blocks=1 instructions=60304 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: runtime_memory_reservation finished after 0.000 seconds +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 620mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60304 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: Running coloring_allocator_psum +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to coloring_allocator_psum: modules=1 functions=1 allocs=11794 blocks=1 instructions=60304 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: allocating PSUM +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: main loop +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: renumber locations +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: size = 6399 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: build_no_bitmap start +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.147 seconds +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 623mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: 100% PSUM demand before spilling +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: PSUM high-water mark = 8 tensors +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: found 17772 edges +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: mean: 5.55462 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: median: 6.99908 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: adjacency vectors require 142176 bytes +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: build_no_bitmap done +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: find costs +2025-08-07T13:54:05Z INFO 49414 (sg01) [TensorCopyElim]: Tensor CP elimination: 0 +2025-08-07T13:54:05Z INFO 49414 (sg01) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: tensor_copy_elim finished after 0.048 seconds +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 623mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 623mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: Running post_sched +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z INFO 49414 [post_scheduler]: Start PosT ScheD 3 sunda Thu Aug 7 13:54:05 2025 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: best-of-n loop, heuristic = 0, allow_psum_spill_within_accum_group = false +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: simplify interference graph +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: initialize low and high +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: lo = 6397 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: hi = 2 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: inf = 0 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: total = 6399 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: simplify +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: new candidates = 0 +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: select ranges +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: no more spills +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: PSUM score = 0 (lower is better) +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: spilling from PSUM cost about 0 cycles +2025-08-07T13:54:05Z INFO 49414 (sg02) [PSUM_Allocator]: 100% PSUM utilization after allocation +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: coloring_allocator_psum finished after 0.211 seconds +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 623mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60304 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: Running dma_optimization_psum +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to dma_optimization_psum: modules=1 functions=1 allocs=11794 blocks=1 instructions=60304 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z INFO 49414 (sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload instructions +2025-08-07T13:54:05Z INFO 49414 (sg02) [DMAOptimizationBase]: [psum spill optimization]: removed 0 spill/reload memory locations +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: dma_optimization_psum finished after 0.038 seconds +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 623mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60304 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: Running address_rotation_psum +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to address_rotation_psum: modules=1 functions=1 allocs=11794 blocks=1 instructions=60304 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z INFO 49414 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks +2025-08-07T13:54:05Z INFO 49414 [post_scheduler]: Time-aware hwm post-sched +2025-08-07T13:54:05Z INFO 49414 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 1 PSUM Banks +2025-08-07T13:54:05Z INFO 49414 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 0 PSUM Banks +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: address_rotation_psum finished after 0.235 seconds +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 631mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11794 memory location(s), 1 block(s), and 60304 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z USER 49414 (sg02) [ModuleForkPass]: Running coloring_allocator_sb +2025-08-07T13:54:05Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to coloring_allocator_sb: modules=1 functions=1 allocs=11794 blocks=1 instructions=60304 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes loaded 840812318 +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average loaded DMA size 2154 bytes +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA DRAM bytes saved 17094410 +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Pre GCA average saved DMA size 2439 bytes +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 8196 +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 248 bytes +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:05Z INFO 49414 (sg02) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: allocating SB +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: main loop +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: renumber locations +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: size = 5357 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: find partners +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: found 6393 accumulation groups +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: largest = _dot.256-t864_i7 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: tensors = 96 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: requires 98304 bytes/partition +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: expanding partners +2025-08-07T13:54:05Z INFO 49414 []: find first defs for local +2025-08-07T13:54:05Z INFO 49414 [post_scheduler]: Time-aware simulation time: 165729830 +2025-08-07T13:54:05Z INFO 49414 []: find first defs for global +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: find loads +2025-08-07T13:54:05Z INFO 49414 [post_scheduler]: Done PosT ScheD Thu Aug 7 13:54:05 2025 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: post_sched finished after 0.504 seconds +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 635mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: Running expand_scheduling_units +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: expand_scheduling_units finished after 0.002 seconds +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 635mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z USER 49414 (sg01) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: 1 pin count +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: 1525 remat count +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: build interference graph +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: pass 1 int-tree +2025-08-07T13:54:05Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Num intervals 5357 Num locations 5357 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: IntervalTree Build Done +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: info.neighbors init Done +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: info.neighbors partners Done +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: IntervalTree readback Done +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: edge: 191190 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: mean: 71.3795 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: median: 57.9004 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: find costs +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: simplify interference graph +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: initialize safe & unsafe +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: safe = 3897 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: unsafe = 652 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: inf = 807 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: total = 5356 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: simplify +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 630 #Pinned 0 #Safe 0 minCost 0.00172952 maxCost 0.723851 locations 5357 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: new candidates = 324 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: select ranges +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Total: 5356 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Spilled: 0.020 (105) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Allocated: 0.980 (5251) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Rover zone: 0.865 (4540) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Pre-rover zone: 0.004 (22) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Post-rover zone: 0.130 (685) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Slice zone: 0.001 (4) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Blocks nothing: 0.039 (205) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Blocks medium: 0.002 (11) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Visited until medium blocking (mean): 0.645 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Visited until medium blocking (median): 0.711 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Visited until medium blocking (p95): 0.731 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Blocks tall: 0.959 (5035) +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Visited until tall blocking (mean): 0.862 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Visited until tall blocking (median): 0.993 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: Success +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: SB spills = 105 tensors +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: size = 155656 bytes/partition +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: remats = 0 tensors +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: unpinned = 0 tensors +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: SB score = 974301 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: best SB heuristic = 0 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: collect spills +2025-08-07T13:54:05Z INFO 49414 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 524 PSUM Banks +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: insert spills +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: deleting loads #loadsToDelete: 0 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: deleting locs #locationsToDelete: 0 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: locationsToDelete done +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: main loop +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: renumber locations +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: size = 5639 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: find partners +2025-08-07T13:54:05Z INFO 49414 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 514 PSUM Banks +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: found 6393 accumulation groups +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: largest = _dot.256-t864_i7 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: tensors = 96 +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: requires 98304 bytes/partition +2025-08-07T13:54:05Z INFO 49414 (sg02) [SB_Allocator]: expanding partners +2025-08-07T13:54:05Z INFO 49414 (sg01) [DMAOptimizationBase]: PSUM Rotation rotated 17 PSUM Banks +2025-08-07T13:54:05Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 12 Sb address +2025-08-07T13:54:05Z INFO 49414 []: find first defs for local +2025-08-07T13:54:06Z INFO 49414 []: find first defs for global +2025-08-07T13:54:06Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 233 Sb address +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: find loads +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: 1 pin count +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: 1759 remat count +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: 1 pinned tensors will require about 16384 bytes/partition +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: build interference graph +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: pass 1 int-tree +2025-08-07T13:54:06Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 30 Sb address +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Num intervals 5639 Num locations 5639 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: IntervalTree Build Done +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: info.neighbors init Done +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: info.neighbors partners Done +2025-08-07T13:54:06Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 38 Sb address +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: IntervalTree readback Done +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: edge: 157145 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: mean: 55.7351 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: median: 51.4479 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: find costs +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: best-of-n loop, heuristic = 0 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: simplify interference graph +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: initialize safe & unsafe +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: safe = 111 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: unsafe = 46 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: inf = 230 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: total = 387 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: simplify +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: simplify_step3_sorted2 #Unsafe 44 #Pinned 0 #Safe 0 minCost 0.00408226 maxCost 0.0326828 locations 5639 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: new candidates = 43 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: (including 230 infinite cost tensors) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: select ranges +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Total: 387 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Spilled: 0.000 (0) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Allocated: 1.000 (387) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Rover zone: 0.488 (189) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Pre-rover zone: 0.008 (3) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Post-rover zone: 0.504 (195) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Slice zone: 0.000 (0) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Blocks nothing: 0.000 (0) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Blocks medium: 0.000 (0) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Blocks tall: 1.000 (387) +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Visited until tall blocking (mean): 1.000 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Visited until tall blocking (median): 1.000 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Visited until tall blocking (p95): 1.000 +2025-08-07T13:54:06Z INFO 49414 (sg02) [SB_Allocator]: Success +2025-08-07T13:54:06Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 302 Sb address +2025-08-07T13:54:06Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:06Z INFO 49414 (sg01) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: address_rotation_sb finished after 0.378 seconds +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 640mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:06Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:54:06Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.098 seconds +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:06Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-08-07T13:54:06Z INFO 49414 (sg01) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: anti_dependency_analyzer finished after 0.013 seconds +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: Running dep_opt +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z INFO 49414 (sg01) [build_flow_deps]: Start build fdeps. Invocation: 7Thu Aug 7 13:54:06 2025 +2025-08-07T13:54:06Z INFO 49414 (sg01) [build_flow_deps]: Allocs: 4722 instructions: 20438 +2025-08-07T13:54:06Z INFO 49414 (sg01) [build_flow_deps]: Build fdeps inserted 61513 edges +2025-08-07T13:54:06Z INFO 49414 (sg01) [build_flow_deps]: Done build fdeps 61513 Thu Aug 7 13:54:06 2025 +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: dep_opt finished after 0.063 seconds +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: Running report_stats +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:06Z INFO 49414 (sg01) [ReportStats]: Data Movement Statistics: sg0001 +┌──────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├──────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 25165824 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 67108864 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ Load │ Const -> Internal │ 2 │ 65536 │ +│ Load │ ExternalInput -> Internal │ 1972 │ 260063744 │ +│ Load │ Input -> Internal │ 6 │ 2097152 │ +│ Load │ Internal │ 132 │ 47448064 │ +│ Save │ Internal │ 99 │ 31195136 │ +│ Save │ Internal -> Output │ 17 │ 8388610 │ +│ Save (Spill) │ Internal │ 44 │ 13041664 │ +└──────────────┴────────────────────────────┴───────┴───────────┘ + +2025-08-07T13:54:06Z INFO 49414 (sg01) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 3 │ +│ 64 │ 2 │ +│ 256 │ 1538 │ +│ 1024 │ 73 │ +│ 2048 │ 156 │ +│ 4096 │ 500 │ +│ 262144 │ 64 │ +│ 8388608 │ 5 │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:06Z INFO 49414 (sg01) [ReportStats]: MM Stats: #MatMults 14360 #MatMult-Transposes 1904 +2025-08-07T13:54:06Z INFO 49414 (sg01) [ReportStats]: IO Tensor size combined: 197149188 +2025-08-07T13:54:06Z INFO 49414 (sg01) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼─────────��────┤ +│ input87 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input84 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input85 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input88 │ ExternalInput │ bfloat16 │ 16777216 │ +│ input94 │ ExternalInput │ bfloat16 │ 16777216 │ +│ input92 │ ExternalInput │ bfloat16 │ 4194304 │ +│ input89 │ ExternalInput │ bfloat16 │ 4194304 │ +│ output4 │ ExternalOutput │ bfloat16 │ 1048576 │ +│ input7 │ ExternalInput │ bfloat16 │ 1048576 │ +│ input6 │ ExternalInput │ bfloat16 │ 1048576 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-08-07T13:54:06Z INFO 49414 (sg01) [ReportStats]: Large (Internal) Tensor Statistics: +┌───────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├───────────────────────────┼──────────┼──────────┼──────────────┤ +│ intermediate1 │ Input │ bfloat16 │ 8388608 │ +│ add.4 │ Internal │ bfloat16 │ 8388608 │ +│ dot.7-buffer-1826 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate6 │ Output │ bfloat16 │ 8388608 │ +│ intermediate4 │ Input │ bfloat16 │ 8388608 │ +│ intermediate7 │ Output │ bfloat16 │ 8388608 │ +│ all_reduce.1-buffer-1828 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate7-buffer-1833 │ Internal │ bfloat16 │ 8388608 │ +│ dot.11-buffer-1831 │ Internal │ bfloat16 │ 8388608 │ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +└───────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-08-07T13:54:06Z USER 49414 (sg01) [ModuleForkPass]: report_stats finished after 0.004 seconds +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:06Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: SB spills = 0 tensors +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: remats = 0 tensors +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: unpinned = 0 tensors +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: size = 0 bytes/partition +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: SB score = 0 +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: spilling from SB cost about 974301 cycles +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: number of tensors spilled from SB = 105 +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: total size of spilled tensors = 155656 bytes/partition +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: 16384 bytes/partition (100%) successfully pinned +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: pinning saved approximately 9010 cycles +2025-08-07T13:54:30Z INFO 49414 (sg02) [SB_Allocator]: 0% SB utilization after allocation +2025-08-07T13:54:30Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes loaded 907008030 +2025-08-07T13:54:30Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average loaded DMA size 2127 bytes +2025-08-07T13:54:30Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes saved 62183434 +2025-08-07T13:54:30Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average saved DMA size 2338 bytes +2025-08-07T13:54:30Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA DRAM bytes DMACopyed 8196 +2025-08-07T13:54:30Z INFO 49414 (sg02) [ColoringAllocator::Rep]: INFO: Post GCA average DMACopyed DMA size 248 bytes +2025-08-07T13:54:30Z USER 49414 (sg02) [ModuleForkPass]: coloring_allocator_sb finished after 25.106 seconds +2025-08-07T13:54:30Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:30Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 12181 memory location(s), 1 block(s), and 60739 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:30Z USER 49414 (sg02) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:30Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=12181 blocks=1 instructions=60739 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:30Z USER 49414 (sg02) [ModuleForkPass]: address_rotation_sb finished after 0.081 seconds +2025-08-07T13:54:30Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 644mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:30Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 12181 memory location(s), 1 block(s), and 60739 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:30Z USER 49414 (sg02) [ModuleForkPass]: Running dma_optimization_sb +2025-08-07T13:54:30Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to dma_optimization_sb: modules=1 functions=1 allocs=12181 blocks=1 instructions=60739 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: DMA optimization In bytes loaded or saved 969191464, 84.99% input load, 4.12715e-07% output write, 15.01% spill/reload [sg0002] +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: removed 0 identical load +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: adjusted 0 DMACopy remat +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: sub-graph will get execute 1 times +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [Load Merging]: removed 0 remat/cloned instructions +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [Load shrink]: shrinked 0 GCA remat/cloned instructions +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [Load Merging + Load shrink] reduced input/const loading DMA traffic 4096, 0.00042262% out of total dma traffic(8.23716e+08) +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 164 spill/reload instructions +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 127 spill/reload memory locations +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 1]: removed 15 spill/reload instructions +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 1]: removed 8 spill/reload memory locations +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload instructions +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 2]: removed 0 spill/reload memory locations +2025-08-07T13:54:30Z INFO 49414 (sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 62526724, 42.9809% out of total spill/reload dma traffic +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload instructions +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [Allocation optimization]: removed 0 spill/reload memory locations +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [Re-allocation Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload instructions +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [spill optimization round 0]: removed 0 spill/reload memory locations +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [Spill Optimization] reduced DMA traffic 0, 0% out of total spill/reload dma traffic +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [remove extra save] removed 0 memlocs and 0 instructions +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 13 spill/reload instructions +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [remove_memset_spill]: removed 1 spill/reload memory locations +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 0 combined 176 SpillSaves and Reloads +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: average loaded DMA size 2165 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: average saved DMA size 2573 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 1 combined 88 SpillSaves and Reloads +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: average loaded DMA size 2186 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: average saved DMA size 3159 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DMA SpillSave Coalescing Round 2 combined 0 SpillSaves and Reloads +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: average loaded DMA size 2186 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: average saved DMA size 3159 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes loaded 867544860 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average loaded DMA size 2186 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing DRAM bytes saved 39112456 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA coalescing average saved DMA size 3159 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [DMA optimization]Reload_just_for_save Optimization removed 0 memlocs +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [Experiment partial DMA access] reduced DMA traffic 3328, 0.00228767% out of total spill/reload dma traffic +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: [DMA optimization] reduced DMA traffic 62534148, 6.4522% out of total dma traffic +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DMA optimization Out bytes loaded or saved 906657316, 90.8515% input load, 4.41181e-07% output write, 9.1485% spill/reload [sg0002] +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes loaded 867544860 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average loaded DMA size 2186 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes saved 39112456 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average saved DMA size 3159 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization DRAM bytes DMAcopyed 8196 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMAcopyed DMA size 248 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Post DMA optimization average DMA size 2215 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: INFO: Finished set_spill_canreadUninit(module); +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DMA optimization re-enable optimization +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: dma_optimization_sb finished after 0.519 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60402 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=11821 blocks=1 instructions=60402 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 320 Sb address +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 783 Sb address +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 229 Sb address +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 12 Sb address +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 243 Sb address +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: address_rotation_sb finished after 0.344 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60402 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running coloring_allocator_dram +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to coloring_allocator_dram: modules=1 functions=1 allocs=11821 blocks=1 instructions=60402 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z INFO 49414 (sg02) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:31Z INFO 49414 (sg02) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: allocating spills in DRAM pre_link mode for address space Local +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: reserved space = 790157338 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: spill space = 53796612 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: aligned spill space = 53837824 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: renumber locations +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: size = 56 +2025-08-07T13:54:31Z INFO 49414 []: find first defs for local +2025-08-07T13:54:31Z INFO 49414 []: find first defs for global +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: Num intervals 56 Num locations 56 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: IntervalTree Build Done +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: info.neighbors init Done +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: IntervalTree readback Done +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: simplify interference graph +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: initialize low and high +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: lo = 56 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: hi = 0 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: total = 56 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: simplify +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: new candidates = 0 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: select ranges +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: allreduce_dram_hwm 16793600 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: Real CC buffer size 16793600 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: DRAM hwm after allocation: 38813696 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DRAM_Allocator]: DRAM allocation successful +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: coloring_allocator_dram finished after 0.087 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60402 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running address_rotation_dram +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to address_rotation_dram: modules=1 functions=1 allocs=11821 blocks=1 instructions=60402 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: Runtime page size at 512MB +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DRAM hwm before rotation 38813696 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: allreduce buffer size 524288000 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: allreduce hwm 16793600 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: Real CC buffer size 16793600 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DRAM hwm after rotation 38813696 +2025-08-07T13:54:31Z INFO 49414 (sg02) [DMAOptimizationBase]: DRAM Rotation rotated 10 Dram address +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: address_rotation_dram finished after 0.037 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60402 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running tensorcopy_accel +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to tensorcopy_accel: modules=1 functions=1 allocs=11821 blocks=1 instructions=60402 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z INFO 49414 (sg02) [TensorCopyAccel::Impl]: Running peephole optimization pass +2025-08-07T13:54:31Z INFO 49414 (sg02) [TensorCopyAccel::Impl]: Accelerated 0 out of 6323 tensorcopy in Function: sg0002 average acceleration factor: -nan +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: tensorcopy_accel finished after 0.005 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60402 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running peephole_opts +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to peephole_opts: modules=1 functions=1 allocs=11821 blocks=1 instructions=60402 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z INFO 49414 (sg02) [PeepholeOpts]: PeepholeOpts enabled? Recip: true Tsp: true Tc: false SplitSelect: true SimplifyMemset true +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: peephole_opts finished after 0.016 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running lower_kernel +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to lower_kernel: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z INFO 49414 (sg02) [LowerKernel]: Started running LowerKernel +2025-08-07T13:54:31Z INFO 49414 (sg02) [LowerKernel]: Start of kernel lowering pass, number of insts: 60406, number of allocs: 11821 +2025-08-07T13:54:31Z INFO 49414 (sg02) [LowerKernel]: Scan BKs time (s): 0.003405 +2025-08-07T13:54:31Z INFO 49414 (sg02) [LowerKernel]: Lower BKs time (s): 6e-06 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: lower_kernel finished after 0.004 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running lower_nki_kernel +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to lower_nki_kernel: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: lower_nki_kernel finished after 0.004 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running dynamic_dma_cleanup +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to dynamic_dma_cleanup: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: dynamic_dma_cleanup finished after 0.007 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running birverifier +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to birverifier: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: birverifier finished after 0.042 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running dynamic_dma_scan +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to dynamic_dma_scan: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: dynamic_dma_scan finished after 0.007 seconds +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z USER 49414 (sg02) [ModuleForkPass]: Running build_fdeps +2025-08-07T13:54:31Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to build_fdeps: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:31Z INFO 49414 (sg02) [build_flow_deps]: Start build fdeps. Invocation: 8Thu Aug 7 13:54:31 2025 +2025-08-07T13:54:31Z INFO 49414 (sg02) [build_flow_deps]: Allocs: 11821 instructions: 60406 +2025-08-07T13:54:32Z INFO 49414 (sg02) [build_flow_deps]: Build fdeps inserted 207169 edges +2025-08-07T13:54:32Z INFO 49414 (sg02) [build_flow_deps]: Done build fdeps 207169 Thu Aug 7 13:54:32 2025 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: build_fdeps finished after 0.145 seconds +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: Running remove_redundancies +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to remove_redundancies: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z INFO 49414 (sg02) [RemoveRedundancies]: remove_clobbered_writes +2025-08-07T13:54:32Z INFO 49414 (sg02) [RemoveRedundancies]: remove_clobbered_writes: 0 +2025-08-07T13:54:32Z INFO 49414 (sg02) [RemoveRedundancies]: remove_useless_insts +2025-08-07T13:54:32Z INFO 49414 (sg02) [RemoveRedundancies]: remove Useless Instructions: 0 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: remove_redundancies finished after 0.021 seconds +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 648mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:32Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:54:32Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.297 seconds +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 683mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: Running tensor_copy_elim +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to tensor_copy_elim: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z INFO 49414 (sg02) [TensorCopyElim]: Tensor CP elimination: 0 +2025-08-07T13:54:32Z INFO 49414 (sg02) [TensorCopyElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: tensor_copy_elim finished after 0.076 seconds +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 662mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: Running prefetch_scheduling_before_sched +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_before_sched: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: prefetch_scheduling_before_sched finished after 0.000 seconds +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 662mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z USER 49414 (sg02) [ModuleForkPass]: Running post_sched +2025-08-07T13:54:32Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to post_sched: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:32Z INFO 49414 [post_scheduler]: Start PosT ScheD 3 sunda Thu Aug 7 13:54:32 2025 +2025-08-07T13:54:33Z INFO 49414 [post_scheduler]: Time-aware hwm post-sched +2025-08-07T13:54:33Z INFO 49414 [post_scheduler]: Time-aware simulation time: 8207905 +2025-08-07T13:54:34Z INFO 49414 [post_scheduler]: Done PosT ScheD Thu Aug 7 13:54:34 2025 +2025-08-07T13:54:34Z USER 49414 (sg02) [ModuleForkPass]: post_sched finished after 1.661 seconds +2025-08-07T13:54:34Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 719mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:34Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:34Z USER 49414 (sg02) [ModuleForkPass]: Running expand_scheduling_units +2025-08-07T13:54:34Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to expand_scheduling_units: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:34Z USER 49414 (sg02) [ModuleForkPass]: expand_scheduling_units finished after 0.026 seconds +2025-08-07T13:54:34Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 691mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:34Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:34Z USER 49414 (sg02) [ModuleForkPass]: Running address_rotation_sb +2025-08-07T13:54:34Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to address_rotation_sb: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:34Z INFO 49414 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 4113 PSUM Banks +2025-08-07T13:54:34Z INFO 49414 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 4693 PSUM Banks +2025-08-07T13:54:34Z INFO 49414 (sg02) [DMAOptimizationBase]: PSUM Rotation rotated 1 PSUM Banks +2025-08-07T13:54:34Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 12 Sb address +2025-08-07T13:54:34Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 227 Sb address +2025-08-07T13:54:35Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 43 Sb address +2025-08-07T13:54:35Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 28 Sb address +2025-08-07T13:54:35Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 181 Sb address +2025-08-07T13:54:35Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:35Z INFO 49414 (sg02) [DMAOptimizationBase]: SB Rotation rotated 0 Sb address +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: address_rotation_sb finished after 1.155 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 697mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:35Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS,PSUM,SB} +2025-08-07T13:54:35Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.206 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 721mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: Running anti_dependency_analyzer +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to anti_dependency_analyzer: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: Batch size: 1000 +2025-08-07T13:54:35Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: Analysis types: {DRAM,ALIAS} +2025-08-07T13:54:35Z INFO 49414 (sg02) [AntiDependencyAnalyzer]: DRAM size: 17179869184 num-bins: 16 bin-size: 1073741824 +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: anti_dependency_analyzer finished after 0.038 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 689mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: Running dep_opt +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to dep_opt: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z INFO 49414 (sg02) [build_flow_deps]: Start build fdeps. Invocation: 9Thu Aug 7 13:54:35 2025 +2025-08-07T13:54:35Z INFO 49414 (sg02) [build_flow_deps]: Allocs: 11821 instructions: 60406 +2025-08-07T13:54:35Z INFO 49414 (sg02) [build_flow_deps]: Build fdeps inserted 203516 edges +2025-08-07T13:54:35Z INFO 49414 (sg02) [build_flow_deps]: Done build fdeps 203516 Thu Aug 7 13:54:35 2025 +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: dep_opt finished after 0.225 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 694mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: Running report_stats +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to report_stats: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z INFO 49414 (sg02) [ReportStats]: Data Movement Statistics: sg0002 +┌──────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├──────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 25165824 │ +│ DMACopy │ Internal │ 1 │ 8388608 │ +│ Load │ Const -> Internal │ 4 │ 34824 │ +│ Load │ ExternalInput -> Internal │ 3018 │ 823676940 │ +│ Load │ Internal │ 100 │ 43833096 │ +│ Save │ Internal │ 675 │ 28626692 │ +│ Save │ Internal -> ExternalOutput │ 1 │ 4 │ +│ Save (Spill) │ Internal │ 20 │ 10485760 │ +└──────────────┴────────────────────────────┴───────┴───────────┘ + +2025-08-07T13:54:35Z INFO 49414 (sg02) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 1 │ +│ 4 │ 9 │ +│ 8 │ 2 │ +│ 16 │ 3 │ +│ 64 │ 2 │ +│ 256 │ 1538 │ +│ 512 │ 593 │ +│ 1024 │ 14 │ +│ 2048 │ 34 │ +│ 4096 │ 1618 │ +│ 60768 │ 1 │ +│ 60776 │ 4 │ +│ 8388608 │ 3 │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:35Z INFO 49414 (sg02) [ReportStats]: MM Stats: #MatMults 48723 #MatMult-Transposes 20371 +2025-08-07T13:54:35Z INFO 49414 (sg02) [ReportStats]: IO Tensor size combined: 773345296 +2025-08-07T13:54:35Z INFO 49414 (sg02) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬────────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼────────────────┼──────────┼──────────────┤ +│ input473 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input469 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input472 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input470 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input474 │ ExternalInput │ bfloat16 │ 8192 │ +│ input471 │ ExternalInput │ bfloat16 │ 8192 │ +│ input1 │ ExternalInput │ int32 │ 4096 │ +│ input3 │ ExternalInput │ float32 │ 12 │ +│ output0 │ ExternalOutput │ int32 │ 4 │ +└────────────────────┴────────────────┴──────────┴──────────────┘ + +2025-08-07T13:54:35Z INFO 49414 (sg02) [ReportStats]: Large (Internal) Tensor Statistics: +┌──────────────────────────┬──────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├──────────────────────────┼──────────┼──────────┼──────────────┤ +│ all_reduce.3-buffer-2825 │ Internal │ bfloat16 │ 8388608 │ +│ dot.14-buffer-2823 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate109 │ Input │ bfloat16 │ 8388608 │ +│ convert.57 │ Internal │ bfloat16 │ 8388608 │ +│ add.9 │ Internal │ bfloat16 │ 8388608 │ +│ intermediate108 │ Input │ bfloat16 │ 8388608 │ +│ DynamicDMAScratchLoc │ Internal │ uint8 │ 2097152 │ +│ -t2849 │ Internal │ float32 │ 1048576 │ +│ -t2843 │ Internal │ float32 │ 1048576 │ +│ -t2838 │ Internal │ float32 │ 1048576 │ +└──────────────────────────┴──────────┴──────────┴──────────────┘ + +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: report_stats finished after 0.016 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 687mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: mod_parallel_pass finished after 31.997 seconds +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: curr_vmrss: 687mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20442 memory location(s), 3 block(s), and 89295 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: Running assign_trigger_engine +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Inputs to assign_trigger_engine: modules=3 functions=3 allocs=20442 blocks=3 instructions=89295 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z INFO 49414 (sg00) [AssignTriggerEngine]: Assigned trigger engine for 163 DMA instructions. Moved 50 DMA instructions to CC's engines. +2025-08-07T13:54:35Z INFO 49414 (sg01) [AssignTriggerEngine]: Assigned trigger engine for 160 DMA instructions. Moved 17 DMA instructions to CC's engines. +2025-08-07T13:54:35Z INFO 49414 (sg02) [AssignTriggerEngine]: Assigned trigger engine for 714 DMA instructions. Moved 19 DMA instructions to CC's engines. +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: assign_trigger_engine finished after 0.089 seconds +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20442 memory location(s), 3 block(s), and 89295 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=3 functions=3 allocs=20442 blocks=3 instructions=89295 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg00) [SubgraphForkPass]: Running lower_local_collectives +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg01) [SubgraphForkPass]: Running lower_local_collectives +2025-08-07T13:54:35Z USER 49414 (sg00) [SubgraphForkPass]: lower_local_collectives finished after 0.000 seconds +2025-08-07T13:54:35Z USER 49414 (sg02) [SubgraphForkPass]: Running lower_local_collectives +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg00) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg00) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: Inputs to lower_local_collectives: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [SubgraphForkPass]: lower_local_collectives finished after 0.000 seconds +2025-08-07T13:54:35Z USER 49414 (sg01) [SubgraphForkPass]: lower_local_collectives finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z USER 49414 (sg00) [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [SubgraphForkPass]: Running extend_shared_lifetimes +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: Inputs to extend_shared_lifetimes: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [SubgraphForkPass]: extend_shared_lifetimes finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg02) [SubgraphForkPass]: Running dead_code_elim +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: Inputs to dead_code_elim: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z INFO 49414 (sg00) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:35Z USER 49414 (sg00) [SubgraphForkPass]: dead_code_elim finished after 0.026 seconds +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: curr_vmrss: 678mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg00) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z INFO 49414 (sg02) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:35Z INFO 49414 (sg01) [DeadCodeElim]: eliminateDeadStore removed 0 instructions +2025-08-07T13:54:35Z USER 49414 (sg01) [SubgraphForkPass]: dead_code_elim finished after 0.062 seconds +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: curr_vmrss: 678mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg01) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg02) [SubgraphForkPass]: dead_code_elim finished after 0.072 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg02) [SubgraphForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 [SubgraphForkPass]: Compilation status: Total subgraphs: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: subgraph_parallel_pass finished after 0.077 seconds +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20442 memory location(s), 3 block(s), and 89295 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: Running assign_hwdge_engine +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Inputs to assign_hwdge_engine: modules=3 functions=3 allocs=20442 blocks=3 instructions=89295 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: assign_hwdge_engine finished after 0.016 seconds +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20442 memory location(s), 3 block(s), and 89295 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:54:35Z INFO 49414 [BackendPassManager]: Inputs to mod_parallel_pass: modules=3 functions=3 allocs=20442 blocks=3 instructions=89295 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z USER 49414 (sg00) [ModuleForkPass]: Running alloc_queues +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: Running alloc_queues +2025-08-07T13:54:35Z USER 49414 (sg01) [ModuleForkPass]: Running alloc_queues +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z INFO 49414 (sg00) [AllocQueues]: DMACopy transpose will be triggered from multiple engines +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z INFO 49414 (sg01) [AllocQueues]: DMACopy transpose will be triggered from multiple engines +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to alloc_queues: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:35Z INFO 49414 (sg02) [AllocQueues]: DMACopy transpose will be triggered from multiple engines +2025-08-07T13:54:35Z INFO 49414 (sg00) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 2 │ +│ qPoolIO0 │ input │ Pool │ 16 │ 1 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 132 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 98 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 49 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 15 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 264 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-08-07T13:54:35Z USER 49414 (sg00) [ModuleForkPass]: alloc_queues finished after 0.002 seconds +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg00) [ModuleForkPass]: Running chain_dma_transposes +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg00) [ModuleForkPass]: chain_dma_transposes finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg00) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg00) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z USER 49414 (sg00) [ModuleForkPass]: Running lower_control +2025-08-07T13:54:35Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:35Z INFO 49414 (sg01) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 1 │ +│ qPoolIO0 │ input │ Pool │ 16 │ 1 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 118 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 112 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 31 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 16 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 2059 │ +└───────────────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-08-07T13:54:35Z USER 49414 (sg01) [ModuleForkPass]: alloc_queues finished after 0.003 seconds +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [ModuleForkPass]: Running chain_dma_transposes +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [ModuleForkPass]: chain_dma_transposes finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z USER 49414 (sg01) [ModuleForkPass]: Running lower_control +2025-08-07T13:54:35Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:35Z INFO 49414 (sg02) [AllocQueues]: Alloc Queue info: +┌───────────────────┬────────────────┬────────────┬────────────┬──────────────────┐ +│ Name │ DMAQueue::Type │ Engine │ Num Queues │ Num instructions │ +├───────────────────┼────────────────┼────────────┼────────────┼──────────────────┤ +│ qSPIO0 │ input │ SP │ 16 │ 5 │ +│ qPoolIO0 │ input │ Pool │ 16 │ 1 │ +│ qSPSpillReload0 │ data │ SP │ 16 │ 85 │ +│ qActSpillReload0 │ data │ Activation │ 16 │ 680 │ +│ qDVESpillReload0 │ data │ DVE │ 16 │ 12 │ +│ qPoolSpillReload0 │ data │ Pool │ 16 │ 22 │ +│ qPoolDynamic │ dynamic │ Pool │ 16 │ 3015 │ +└───────��───────────┴────────────────┴────────────┴────────────┴──────────────────┘ + +2025-08-07T13:54:35Z USER 49414 (sg02) [ModuleForkPass]: alloc_queues finished after 0.012 seconds +2025-08-07T13:54:35Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:35Z INFO 49414 (sg00) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: Running chain_dma_transposes +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to chain_dma_transposes: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: chain_dma_transposes finished after 0.000 seconds +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: Running prefetch_scheduling_after_sched +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to prefetch_scheduling_after_sched: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: prefetch_scheduling_after_sched finished after 0.000 seconds +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z USER 49414 (sg00) [ModuleForkPass]: lower_control finished after 0.012 seconds +2025-08-07T13:54:36Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: Running lower_control +2025-08-07T13:54:36Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:36Z USER 49414 (sg00) [ModuleForkPass]: Running dep_reduction +2025-08-07T13:54:36Z INFO 49414 (sg00) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=3899 blocks=1 instructions=8451 Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Start Dependency Reduction +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to lower_control: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Processing async instrs... +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Processing secondary edges per engine... +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 7187 +2025-08-07T13:54:36Z INFO 49414 (sg01) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Processing redundant descendants, Done. Num edges removed 7653 +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Processing async instrs, Done. Num edges removed 7653 +2025-08-07T13:54:36Z USER 49414 (sg01) [ModuleForkPass]: lower_control finished after 0.032 seconds +2025-08-07T13:54:36Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 677mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:36Z USER 49414 (sg01) [ModuleForkPass]: Running dep_reduction +2025-08-07T13:54:36Z INFO 49414 (sg01) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=4722 blocks=1 instructions=20438 Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Start Dependency Reduction +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Processing async instrs... +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Processing secondary edges per engine... +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 25928 +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Num Async removed: 0 +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Finished dependency reduction: 44244 removed, new total 3760 +2025-08-07T13:54:36Z INFO 49414 (sg00) [DepReduction]: Finished Dependency Reduction +2025-08-07T13:54:36Z USER 49414 (sg00) [ModuleForkPass]: dep_reduction finished after 0.074 seconds +2025-08-07T13:54:36Z INFO 49414 (sg00) [ModuleForkPass]: curr_vmrss: 681mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 (sg00) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 3899 memory location(s), 1 block(s), and 8451 instruction(s). Max writers: 32 Max Readers: 1312 +2025-08-07T13:54:36Z INFO 49414 (sg02) [LowerControl]: EraseInterBbDeps removed 0 inter-BB deps +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Processing redundant descendants, Done. Num edges removed 28202 +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Processing async instrs, Done. Num edges removed 28202 +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: lower_control finished after 0.106 seconds +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 681mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: Running dep_reduction +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Inputs to dep_reduction: modules=1 functions=1 allocs=11821 blocks=1 instructions=60406 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Start Dependency Reduction +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Processing async instrs... +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Processing secondary edges per engine... +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Processing secondary edges per engine, Done. Num edges removed 59058 +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Num Async removed: 0 +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Finished dependency reduction: 132848 removed, new total 6419 +2025-08-07T13:54:36Z INFO 49414 (sg01) [DepReduction]: Finished Dependency Reduction +2025-08-07T13:54:36Z USER 49414 (sg01) [ModuleForkPass]: dep_reduction finished after 0.204 seconds +2025-08-07T13:54:36Z INFO 49414 (sg01) [ModuleForkPass]: curr_vmrss: 698mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 (sg01) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 4722 memory location(s), 1 block(s), and 20438 instruction(s). Max writers: 48 Max Readers: 1904 +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Processing redundant descendants, Done. Num edges removed 62932 +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Processing async instrs, Done. Num edges removed 62932 +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Num Async removed: 0 +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Finished dependency reduction: 413818 removed, new total 18893 +2025-08-07T13:54:36Z INFO 49414 (sg02) [DepReduction]: Finished Dependency Reduction +2025-08-07T13:54:36Z USER 49414 (sg02) [ModuleForkPass]: dep_reduction finished after 0.673 seconds +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: curr_vmrss: 724mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 (sg02) [ModuleForkPass]: Output has 1 module(s), 1 function(s), 11821 memory location(s), 1 block(s), and 60406 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 [ModuleForkPass]: Compilation status: Total modules: 3, Passed: 3, Failed: 0 +2025-08-07T13:54:36Z USER 49414 [BackendPassManager]: mod_parallel_pass finished after 0.838 seconds +2025-08-07T13:54:36Z INFO 49414 [BackendPassManager]: curr_vmrss: 720mb, ru_maxrss: 739mb (delta=0mb) +2025-08-07T13:54:36Z INFO 49414 [BackendPassManager]: Output has 3 module(s), 3 function(s), 20442 memory location(s), 3 block(s), and 89295 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 [BackendPassManager]: Running nc_parallel_pass +2025-08-07T13:54:36Z INFO 49414 [BackendPassManager]: Inputs to nc_parallel_pass: modules=3 functions=3 allocs=20442 blocks=3 instructions=89295 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z USER 49414 [CoreForkPass]: Running bir_linker +2025-08-07T13:54:36Z INFO 49414 [CoreForkPass]: Inputs to bir_linker: modules=3 functions=3 allocs=20442 blocks=3 instructions=89295 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:36Z INFO 49414 (sgLnk) [BirLinker]: bir_linker cwd: +2025-08-07T13:54:36Z INFO 49414 (sgLnk) [BirLinker]: Num intermediates 111 +2025-08-07T13:54:36Z INFO 49414 (sgLnk) [BirLinker]: Num Module Definitions 3 +2025-08-07T13:54:36Z INFO 49414 (sgLnk) [BirLinker]: Linking to a call-graph structure +2025-08-07T13:54:36Z INFO 49414 (sgLnk) [BirLinker]: Added a new SpillReload Que qPoolPIOParam0 +2025-08-07T13:54:37Z INFO 49414 (sgLnk) [BirLinker]: tensor_map verification successful. +2025-08-07T13:54:37Z INFO 49414 (sgLnk) [BirLinker]: Writing updated tensor_map /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/sgLnk/sg00/tensor_map.json +2025-08-07T13:54:37Z INFO 49414 (sgLnk) [BirLinker]: PostLink Stats: #MatMults 555691 #MatMult-Transposes 88323 +2025-08-07T13:54:37Z INFO 49414 (sgLnk) [BirLinker]: Total Intermediate MMTs 9776 #out: 9216 #inp: 560 #symmetric: 0 +2025-08-07T13:54:37Z INFO 49414 (sgLnk) [BirLinker]: Total Intermediate IOs with MMTs: 38 #out: 36 #inp: 2 #both: 0 +2025-08-07T13:54:37Z INFO 49414 (sgLnk) [BirLinker]: releasing pre-link modules +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [BirLinker]: linking Done. +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: bir_linker finished after 1.193 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 1152mb, ru_maxrss: 1152mb (delta=413mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running postlnk_dma_report +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to postlnk_dma_report: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DMAReport]: DMA Report: Bytes loaded or saved 1411019560, 81.1284% input load, 1.30048% output write, 17.5712% spill/reload +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: postlnk_dma_report finished after 0.011 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 609mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running report_stats +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to report_stats: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: Data Movement Statistics: main +┌─────────────┬──────┬───────┬───────┐ +│ Instruction │ Kind │ Count │ Bytes │ +└─────────────┴──────┴───────┴───────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: Data Movement Statistics: sg0000 +┌──────────────┬────────────────────────────┬───────┬────────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├──────────────┼────────────────────────────┼───────┼────────────┤ +│ DMACopy │ ExternalInput -> Internal │ 17 │ 9957281792 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 67108864 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ Load │ Const -> Internal │ 3 │ 65792 │ +│ Load │ ExternalInput -> Internal │ 148 │ 58733056 │ +│ Load │ Internal │ 178 │ 45367296 │ +│ Save │ Internal │ 62 │ 16252928 │ +│ Save │ Internal -> Output │ 37 │ 9961474 │ +│ Save (Spill) │ Internal │ 51 │ 11681792 │ +└──────────────┴────────────────────────────┴───────┴────────────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 3 │ +│ 64 │ 1 │ +│ 256 │ 3 │ +│ 512 │ 1 │ +│ 896 │ 6 │ +│ 1024 │ 46 │ +│ 1920 │ 32 │ +│ 2048 │ 288 │ +│ 4096 │ 116 │ +│ 262144 │ 64 │ +│ 8388608 │ 2 │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: Data Movement Statistics: sg0001 +┌──────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├──────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 25165824 │ +│ DMACopy │ Internal -> ExternalOutput │ 64 │ 67108864 │ +│ DMACopy │ Internal -> Output │ 1 │ 16777216 │ +│ Load │ Const -> Internal │ 2 │ 65536 │ +│ Load │ ExternalInput -> Internal │ 1972 │ 260063744 │ +│ Load │ Input -> Internal │ 6 │ 2097152 │ +│ Load │ Internal │ 132 │ 47448064 │ +│ Save │ Internal │ 99 │ 31195136 │ +│ Save │ Internal -> Output │ 17 │ 8388610 │ +│ Save (Spill) │ Internal │ 44 │ 13041664 │ +└──────────────┴────────────────────────────┴───────┴───────────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 3 │ +│ 64 │ 2 │ +│ 256 │ 1538 │ +│ 1024 │ 73 │ +│ 2048 │ 156 │ +│ 4096 │ 500 │ +│ 262144 │ 64 │ +│ 8388608 │ 5 │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: Data Movement Statistics: sg0002 +┌──────────────┬────────────────────────────┬───────┬───────────┐ +│ Instruction │ Kind │ Count │ Bytes │ +├──────────────┼────────────────────────────┼───────┼───────────┤ +│ DMACopy │ Input -> Internal │ 1 │ 25165824 │ +│ DMACopy │ Internal │ 1 │ 8388608 │ +│ Load │ Const -> Internal │ 4 │ 34824 │ +│ Load │ ExternalInput -> Internal │ 3018 │ 823676940 │ +│ Load │ Internal │ 100 │ 43833096 │ +│ Save │ Internal │ 675 │ 28626692 │ +│ Save │ Internal -> ExternalOutput │ 1 │ 4 │ +│ Save (Spill) │ Internal │ 20 │ 10485760 │ +└──────────────┴────────────────────────────┴───────┴───────────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: +┌─────────────────────┬───────┐ +│ Bytes per partition │ Count │ +├─────────────────────┼───────┤ +│ 2 │ 1 │ +│ 4 │ 9 │ +│ 8 │ 2 │ +│ 16 │ 3 │ +│ 64 │ 2 │ +│ 256 │ 1538 │ +│ 512 │ 593 │ +│ 1024 │ 14 │ +│ 2048 │ 34 │ +│ 4096 │ 1618 │ +│ 60768 │ 1 │ +│ 60776 │ 4 │ +│ 8388608 │ 3 │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: MM Stats: #MatMults 67451 #MatMult-Transposes 23587 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: IO Tensor size combined: 9981025324 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: IO Tensor Statistics: +┌────────────────────┬───────────────┬──────────┬──────────────┐ +│ Largest IO Tensors │ Kind │ Src Type │ Size (Bytes) │ +├────────────────────┼───────────────┼──────────┼──────────────┤ +│ input76_sg0000 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input473_sg0002 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input76 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input473 │ ExternalInput │ bfloat16 │ 622329856 │ +│ input131 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input109 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input98 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input153 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input87 │ ExternalInput │ bfloat16 │ 50331648 │ +│ input175 │ ExternalInput │ bfloat16 │ 50331648 │ +└────────────────────┴───────────────┴──────────┴──────────────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ReportStats]: Large (Internal) Tensor Statistics: +┌─────────────────┬───────────────────┬──────────┬──────────────┐ +│ Largest Tensors │ Kind │ Src Type │ Size (Bytes) │ +├─────────────────┼───────────────────┼──────────┼──────────────┤ +│ intermediate1 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate4 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate18 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate9 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate15 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate12 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate27 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate24 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate21 │ InternalInterface │ bfloat16 │ 8388608 │ +│ intermediate6 │ InternalInterface │ bfloat16 │ 8388608 │ +└─────────────────┴───────────────────┴──────────┴──────────────┘ + +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: report_stats finished after 0.023 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 609mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running coloring_allocator_dram_post_lnk +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: allocating spills in DRAM post_link mode for address space Local +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: reserved space = 8342046740 bytes +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: spill space = 605552712 bytes +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: aligned spill space = 605700096 bytes +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: dram space = 107374182400 bytes +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: renumber locations +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: size = 111 +2025-08-07T13:54:38Z INFO 49414 []: find first defs for local +2025-08-07T13:54:38Z INFO 49414 []: find first defs for global +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: Num intervals 111 Num locations 111 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: IntervalTree Build Done +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: info.neighbors init Done +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: IntervalTree readback Done +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: simplify interference graph +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: initialize low and high +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: lo = 111 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: hi = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: total = 111 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: simplify +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: new candidates = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: Already used DRAM hwm: 55443456 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: select ranges +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: CC buffer size limit 524288000 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: allreduce_dram_hwm 55443456 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: Real CC buffer size 55443456 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: DRAM hwm after allocation: 98971648 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [DRAM_Allocator]: DRAM allocation successful +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: coloring_allocator_dram_post_lnk finished after 0.064 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 609mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running memory_analysis_after_coloring_allocator_dram_post_lnk +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to memory_analysis_after_coloring_allocator_dram_post_lnk: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: memory_analysis_after_coloring_allocator_dram_post_lnk finished after 0.038 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running lower_dynamic_dma +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to lower_dynamic_dma: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: lower_dynamic_dma finished after 0.012 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running legalize_dynamic_dma +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to legalize_dynamic_dma: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [LegalizeDynamicDMA]: Legalize Dynamic DMA scanned 1 DGE instructions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [LegalizeDynamicDMA]: After Legalize Dynamic DMA, 1 DGE instructions were scanned +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [LegalizeDynamicDMA]: +┌───────────┬───────────────────────────────┬────────────────────────────┐ +│ Sub-Pass │ Illegal Instructions Detected │ New Instructions Generated │ +├───────────┼───────────────────────────────┼────────────────────────────┤ +│ Peeling │ 0 │ 0 │ +│ Unrolling │ 0 │ 0 │ +│ Splitting │ 0 │ 0 │ +└───────────┴───────────────────────────────┴────────────────────────────┘ + +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: legalize_dynamic_dma finished after 0.030 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running lower_dma +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to lower_dma: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [LowerDMA]: lower_dma metrics start + IO + Copy (DGE/DMA) + 128 partition : 72840/72840 (100% DGE) + power-of-2 partition : 72878/72919 (99.9438% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 72878/72919 (99.9438% DGE) + Cast (DGE/DMA) + 128 partition : 145/145 (100% DGE) + power-of-2 partition : 145/146 (99.3151% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 145/146 (99.3151% DGE) + Spill/Reload + Copy (DGE/DMA) + 128 partition : 0/9782 (0% DGE) + power-of-2 partition : 0/10788 (0% DGE) + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/10788 (0% DGE) + Cast (DGE/DMA) + 128 partition : 0/0 + power-of-2 partition : 0/0 + > 3 dimensional : 0/0 + non-integer desc size : 0/0 + total : 0/0 + CopyMode + CCE : 36 + Transpose : 1 + Replicate : 0 + Dynamic (DGE/DMA) + scalar : 1/1 (100% DGE) + vector : 2320/2320 (100% DGE) + Opcode + ReadVarAddr : 0 + IndirectLoad : 0 + IndirectSave : 0 + IndirectSaveAccumulate : 0 + DstReduceDGE : 0 +lower_dma metrics end +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: lower_dma finished after 0.127 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running expand_all_engine +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to expand_all_engine: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: expand_all_engine finished after 0.013 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running alloc_semaphores +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to alloc_semaphores: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: alloc_semaphores finished after 0.074 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89359 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running expand_inst_late +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to expand_inst_late: modules=1 functions=4 allocs=21101 blocks=4 instructions=89359 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: expand_inst_late finished after 0.069 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89634 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running seq_inst_opt +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to seq_inst_opt: modules=1 functions=4 allocs=21101 blocks=4 instructions=89634 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [SeqInstOpt]: Removing 205 unnecessary InstRegisterMove instruction(s) from Block1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [SeqInstOpt]: Removing 63 unnecessary InstRegisterMove instruction(s) from Block1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [SeqInstOpt]: Removing 0 unnecessary InstRegisterMove instruction(s) from Block1 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: seq_inst_opt finished after 0.010 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 610mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 89366 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running lower_sync +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to lower_sync: modules=1 functions=4 allocs=21101 blocks=4 instructions=89366 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: lower_sync finished after 0.043 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 617mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95237 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running lower_act +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to lower_act: modules=1 functions=4 allocs=21101 blocks=4 instructions=95237 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: lower_act finished after 0.011 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 617mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running lower_dve +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to lower_dve: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [LowerDVE]: Loading DVE opcodes table dve_info.json from /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/dve/dve_bin_gen2/dve_info.json +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: lower_dve finished after 0.108 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 626mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running lower_ap +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to lower_ap: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: lower_ap finished after 0.016 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 626mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: Running coloring_allocator_reg +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Inputs to coloring_allocator_reg: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: allocating REG +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: main loop iteration 1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: allocating REG +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: main loop iteration 1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: renumber registers +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: size = 3 +2025-08-07T13:54:38Z INFO 49414 []: find first defs for local reg +2025-08-07T13:54:38Z INFO 49414 []: find first defs for global reg +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: live range analysis +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: find costs +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: simplify interference graph +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: initialize low and high +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: lo = 3 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: hi = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: inf = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: total = 3 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: simplify +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: new candidates = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: select ranges +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: no more spills +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: allocating REG +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: main loop iteration 1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: renumber registers +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: size = 1 +2025-08-07T13:54:38Z INFO 49414 []: find first defs for local reg +2025-08-07T13:54:38Z INFO 49414 []: find first defs for global reg +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: live range analysis +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: find costs +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: simplify interference graph +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: initialize low and high +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: lo = 1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: hi = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: inf = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: total = 1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: simplify +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: new candidates = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: select ranges +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: no more spills +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: Allocating functions +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [ColoringAllocator::Rep]: linearize and check +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: allocating REG +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: main loop iteration 1 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: renumber registers +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: size = 4 +2025-08-07T13:54:38Z INFO 49414 []: find first defs for local reg +2025-08-07T13:54:38Z INFO 49414 []: find first defs for global reg +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: live range analysis +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: find costs +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: simplify interference graph +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: initialize low and high +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: lo = 4 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: hi = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: inf = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: total = 4 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: simplify +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: new candidates = 0 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: select ranges +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: no more spills +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: REG score = 0 (lower is better) +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: Spilling from REG cost about 0 cycles +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [REG_Allocator]: 0% REG utilization after allocation +2025-08-07T13:54:38Z USER 49414 [CoreForkPass]: coloring_allocator_reg finished after 0.108 seconds +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: curr_vmrss: 629mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [CoreForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [BackendPassManager]: nc_parallel_pass finished after 2.039 seconds +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: curr_vmrss: 629mb, ru_maxrss: 1152mb (delta=413mb) +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [ModuleForkPass]: Running birverifier +2025-08-07T13:54:38Z INFO 49414 [ModuleForkPass]: Inputs to birverifier: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [ModuleForkPass]: birverifier finished after 0.091 seconds +2025-08-07T13:54:38Z INFO 49414 [ModuleForkPass]: curr_vmrss: 638mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [ModuleForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [BackendPassManager]: mod_parallel_pass finished after 0.094 seconds +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: curr_vmrss: 638mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [BackendPassManager]: Running subgraph_parallel_pass +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: Inputs to subgraph_parallel_pass: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [SubgraphForkPass]: Running lnc_verifier +2025-08-07T13:54:38Z INFO 49414 [SubgraphForkPass]: Inputs to lnc_verifier: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [SubgraphForkPass]: lnc_verifier finished after 0.001 seconds +2025-08-07T13:54:38Z INFO 49414 [SubgraphForkPass]: curr_vmrss: 638mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [SubgraphForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [BackendPassManager]: subgraph_parallel_pass finished after 0.002 seconds +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: curr_vmrss: 638mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [BackendPassManager]: Running mod_parallel_pass +2025-08-07T13:54:38Z INFO 49414 [BackendPassManager]: Inputs to mod_parallel_pass: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z USER 49414 [ModuleForkPass]: Running codegen +2025-08-07T13:54:38Z INFO 49414 [ModuleForkPass]: Inputs to codegen: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [Codegen]: Total compiler allocated DRAM tensors: 0.0921745 GB +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [Codegen]: Total un-allocated DRAM tensors by kind: +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [Codegen]: +┌────────────────┬─────────────┐ +│ TensorKind │ Size (GB) │ +├────────────────┼─────────────┤ +│ ExternalInput │ 7.62851 │ +│ ExternalOutput │ 0.0703125 │ +│ Const │ 0.000154741 │ +└────────────────┴─────────────┘ + +2025-08-07T13:54:38Z INFO 49414 (sgLnk) [Codegen]: Total runtime managed DRAM tensors: 7.69898 GB +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Instruction Stats: +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: +┌─────────────────────┬───────┐ +│ Opcode │ Count │ +├─────────────────────┼───────┤ +│ MATMUL │ 67459 │ +│ LDWEIGHTS │ 67079 │ +│ ACTIVATE │ 10464 │ +│ EVENT_SEMAPHORE │ 5871 │ +│ UNKNOWN(0xd4) │ 5338 │ +│ PSEUDO_DMA_TRIGGER │ 1381 │ +│ TENSOR_TENSOR │ 1021 │ +│ UNKNOWN(0x24) │ 448 │ +│ UNKNOWN(0x8d) │ 448 │ +│ MATCH_VALUE_LOAD │ 441 │ +│ TENSOR_SCALAR_ADDR │ 341 │ +│ UNKNOWN(0xe8) │ 260 │ +│ UNKNOWN(0x8b) │ 240 │ +│ COPY │ 238 │ +│ FIND_INDEX8 │ 224 │ +│ MAX8 │ 224 │ +│ MATCH_REPLACE8 │ 217 │ +│ TENSOR_SCALAR │ 195 │ +│ UNKNOWN(0xd3) │ 185 │ +│ MEMSET │ 175 │ +│ UNKNOWN(0xda) │ 153 │ +│ TENSOR_REDUCE │ 140 │ +│ UNKNOWN(0x8a) │ 128 │ +│ UNKNOWN(0x92) │ 128 │ +│ GATHER │ 99 │ +│ POOL_BUFFER_LOAD │ 99 │ +│ CAST │ 97 │ +│ RECIPROCAL │ 67 │ +│ ACT_TABLE_LOAD │ 29 │ +│ PSEUDO_BRANCH_LABEL │ 20 │ +│ IOTA │ 19 │ +│ UNKNOWN(0xd2) │ 15 │ +│ PSEUDO_DMA_REARM │ 12 │ +│ UNKNOWN(0xcf) │ 12 │ +│ UNKNOWN(0xd9) │ 8 │ +│ MOVE │ 4 │ +│ STREAM_SHUFFLE │ 4 │ +│ LOAD_MASK_SELECT │ 4 │ +│ ALU_OP │ 2 │ +│ UNKNOWN(0xe5) │ 2 │ +│ PSEUDO_TENSOR_LOAD │ 1 │ +│ TENSOR_SCALAR │ 1 │ +│ RNG │ 1 │ +└─────────────────────┴───────┘ + +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: +┌────────────┬────────┐ +│ Engine │ Count │ +├────────────┼────────┤ +│ Unassigned │ 0 │ +│ GPSIMD │ 8796 │ +│ Scalar │ 12988 │ +│ Tensor │ 136451 │ +│ SyncDMA │ 0 │ +│ Vector │ 4470 │ +│ Sync │ 609 │ +│ All │ 0 │ +└────────────┴────────┘ + +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Total instructions: 163314 (0.00973427 GB) +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Total DynamicDMA instruction count: 5338 +2025-08-07T13:54:39Z USER 49414 (sgLnk) [Codegen]: isa_gen finished after 0.357 seconds +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Number of DMA descriptors on each queue instance: +┌───────────────────────────┬────────────────┐ +│ Queue Instance │ RT Descriptors │ +├───────────────────────────┼────────────────┤ +│ qActSpillReload0_defId_0 │ 25088 │ +│ qActSpillReload0_defId_1 │ 28672 │ +│ qActSpillReload0_defId_2 │ 22188 │ +│ qDVESpillReload0_defId_0 │ 3840 │ +│ qDVESpillReload0_defId_1 │ 6528 │ +│ qDVESpillReload0_defId_2 │ 2056 │ +│ qPoolIO0 │ 2 │ +│ qPoolPIOParam0 │ 72 │ +│ qPoolSpillReload0_defId_0 │ 16896 │ +│ qPoolSpillReload0_defId_1 │ 4096 │ +│ qPoolSpillReload0_defId_2 │ 4870 │ +│ qSPIO0 │ 147610 │ +│ qSPSpillReload0_defId_0 │ 33538 │ +│ qSPSpillReload0_defId_1 │ 30208 │ +│ qSPSpillReload0_defId_2 │ 17950 │ +└───────────────────────────┴────────────────┘ + +Total descriptors: 343614 (0.00512025 GB) +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Number of DMA engines used by each queue: +┌───────────────────┬──────────────────────┐ +│ Queue │ DMA Engines │ +├───────────────────┼──────────────────────┤ +│ qSPIO0 │ 16 │ +│ qSPSpillReload0 │ 16 │ +│ qPoolDynamic │ 16 │ +│ qActSpillReload0 │ 16 │ +│ qPoolSpillReload0 │ 16 │ +│ qDVESpillReload0 │ 16 │ +│ qPoolIO0 │ 16 │ +│ qPoolPIOParam0 │ 16 │ +├───────────────────┼──────────────────────┤ +│ TOTAL │ 128 (must be <= 176) │ +└───────────────────┴──────────────────────┘ + +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Tensors with largest descriptor count: +┌────────────────────────────┬──────────┬──────────┬──────────────────┐ +│ Tensor Name │ Kind │ Src Type │ Descriptor Count │ +├────────────────────────────┼──────────┼──────────┼──────────────────┤ +│ add.9_sg0002 │ Internal │ bfloat16 │ 17 │ +│ all_gather.1_i0_sg0000 │ Internal │ bfloat16 │ 24 │ +│ all_gather.1_i1_sg0000 │ Internal │ bfloat16 │ 25 │ +│ dot.11-buffer-1831_sg0001 │ Internal │ bfloat16 │ 32 │ +│ dot.4-buffer-2238_sg0000 │ Internal │ bfloat16 │ 32 │ +│ dot.14-buffer-2823_sg0002 │ Internal │ bfloat16 │ 32 │ +│ dot.7-buffer-1826_sg0001 │ Internal │ bfloat16 │ 32 │ +│ all-reduce.519.1841_sg0001 │ Internal │ bfloat16 │ 35 │ +│ add.4_sg0001 │ Internal │ bfloat16 │ 51 │ +│ convert.59_sg0002 │ Internal │ float32 │ 599 │ +└────────────────────────────┴──────────┴──────────┴──────────────────┘ + +2025-08-07T13:54:39Z USER 49414 (sgLnk) [Codegen]: dma_desc_gen finished after 0.026 seconds +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Estimated peak DRAM usage: 7.80601 GB +2025-08-07T13:54:39Z INFO 49414 (sgLnk) [Codegen]: Generating debug info +2025-08-07T13:54:39Z WARNING 49414 (sgLnk) [Codegen]: Found 127 instructions with more than 100 dependencies. For each such instruction, skipping writing more than 100 dependencies into the built-in NEFF debug info to prevent excessive compile time and NEFF size. For those instructions, the Neuron profiler will not display the skipped dependencies. +2025-08-07T13:54:39Z USER 49414 (sgLnk) [Codegen]: debug_info_gen finished after 0.205 seconds +2025-08-07T13:54:39Z USER 49414 [ModuleForkPass]: codegen finished after 0.618 seconds +2025-08-07T13:54:39Z INFO 49414 [ModuleForkPass]: curr_vmrss: 721mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:39Z INFO 49414 [ModuleForkPass]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:39Z USER 49414 [BackendPassManager]: mod_parallel_pass finished after 0.622 seconds +2025-08-07T13:54:39Z INFO 49414 [BackendPassManager]: curr_vmrss: 721mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:39Z INFO 49414 [BackendPassManager]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:39Z USER 49414 [BackendPassManager]: Running neff_packager +2025-08-07T13:54:39Z INFO 49414 [BackendPassManager]: Inputs to neff_packager: modules=1 functions=4 allocs=21101 blocks=4 instructions=95266 Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0000_constant.9-1360_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0000_identity_1547_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0000_t2261_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0001_identity_1184_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0001_t1844_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.24_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.25_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0002_constant.26-822-934_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: FileDeDuper file not found value_sg0002_identity_1077_CRSM.npy +2025-08-07T13:54:39Z INFO 49414 [NeffPackager]: Const File de-dup saved 0 KB of memory footprint +2025-08-07T13:54:39Z WARNING 49414 [NeffFileWriter]: writeKelp missing file /local/p4clients/pkgbuild-const/workspace/build/KaenaCompiler/KaenaCompiler-2.x.169490.0/AL2_x86_64/DEV.STD.PTHREAD/build/private/_skbuild/linux-x86_64-3.10/cmake-build/neuronxcc/walrus/neff_packager/MetricMetadata.json +2025-08-07T13:54:39Z INFO 49414 [NeffFileWriter]: Neff will be written to: /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.neff +2025-08-07T13:54:39Z INFO 49414 [NeffFileWriter]: IR signature: d0aab7e369a46fb7143fb478eb9b019f for neff artifacts +2025-08-07T13:54:39Z USER 49414 [BackendPassManager]: neff_packager finished after 0.135 seconds +2025-08-07T13:54:39Z INFO 49414 [BackendPassManager]: curr_vmrss: 721mb, ru_maxrss: 1152mb (delta=0mb) +2025-08-07T13:54:39Z INFO 49414 [BackendPassManager]: Output has 1 module(s), 4 function(s), 21101 memory location(s), 4 block(s), and 95266 instruction(s). Max writers: 594 Max Readers: 20371 +2025-08-07T13:54:39Z INFO 49414 [BackendDriver]: HBM scratchpad usage summary (post-allocation): +┌──────┬───────────┬────────────────────────────────────────────────────────────┬─────────────┐ +│ Core │ Subgraph │ Description │ Value │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ nc00 │ sg00 │ Peak scratchpad usage: local │ 0.036621 GB │ +│ nc00 │ sg00 │ Total size of allocated tensors: local │ 0.036621 GB │ +│ nc00 │ sg01 │ Peak scratchpad usage: local │ 0.051636 GB │ +│ nc00 │ sg01 │ Total size of allocated tensors: local │ 0.062500 GB │ +│ nc00 │ sg02 │ Peak scratchpad usage: local │ 0.036148 GB │ +│ nc00 │ sg02 │ Total size of allocated tensors: local │ 0.050140 GB │ +│ nc00 │ Max │ Peak scratchpad usage: local │ 0.051636 GB │ +│ nc00 │ Post-link │ Peak scratchpad usage after intermediate tensor allocation │ 0.092175 GB │ +│ nc00 │ Post-link │ Total size of allocated intermediate tensors │ 0.564102 GB │ +├──────┼───────────┼────────────────────────────────────────────────────────────┼─────────────┤ +│ Max │ Max │ Peak scratchpad usage │ 0.092175 GB │ +│ Max │ Max │ Peak scratchpad usage (page-aligned) │ 0.500000 GB │ +└──────┴───────────┴────────────────────────────────────────────────────────────┴─────────────┘ + +2025-08-07T13:54:39Z INFO 49414 [BackendDriver]: Backend completed successfully, tearing down. +2025-08-07T13:54:39Z INFO 48502 [job.WalrusDriver.0]: new_lnkState: {"model": ["/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "cached_wavegraph": "walrus_bir.out.json", "state_dir": "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/sgLnk/sg00", "state_id": "sgLnk"} +2025-08-07T13:54:39Z INFO 48502 [job.WalrusDriver.0]: MTBackend: completed successfully. +2025-08-07T13:54:39Z INFO 48502 [pipeline.Pipeline.0]: Finished job job.WalrusDriver.0 +2025-08-07T13:54:39Z INFO 48502 [pipeline.Pipeline.0]: Starting job job.BIRLinker.0 +2025-08-07T13:54:39Z INFO 48502 [job.BIRLinker.0]: Replay this job by calling: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/bin/neuronx-cc compile --framework XLA --state '{"model": ["/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.hlo_module.pb"], "tensormap": "tensor_map.json", "bir": "walrus_bir.out.json", "lorean_sg_key": null, "input_name_map": null, "output_name_map": null, "constant_tensors": null, "cached_wavegraph": "walrus_bir.out.json", "state_dir": "/home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/sgLnk/sg00", "state_id": "sgLnk"}' --pipeline BIRLinker +2025-08-07T13:54:39Z INFO 48502 [job.BIRLinker.0]: BIRLinker cwd: /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6 +2025-08-07T13:54:39Z INFO 48502 [job.BIRLinker.0]: Linking already done. +2025-08-07T13:54:39Z INFO 48502 [pipeline.Pipeline.0]: Finished job job.BIRLinker.0 +2025-08-07T13:54:39Z INFO 48502 [pipeline.Pipeline.0]: Starting job job.Kelper.0 +2025-08-07T13:54:39Z INFO 48502 [job.Kelper.0]: Skipping neff generation which was already performed by neff_packager +2025-08-07T13:54:39Z INFO 48502 [pipeline.Pipeline.0]: Finished job job.Kelper.0 +2025-08-07T13:54:39Z INFO 48502 [pipeline.Pipeline.0]: Starting job job.NeffWrapper.0 +2025-08-07T13:54:39Z INFO 48502 [job.NeffWrapper.0]: Job NeffWrapper len(in_states) 1 +2025-08-07T13:54:39Z INFO 48502 [job.NeffWrapper.0]: Processing input #0 +2025-08-07T13:54:39Z INFO 48502 [job.NeffWrapper.0]: Start NeffWrapper +2025-08-07T13:54:39Z INFO 48502 [job.NeffWrapper.0]: Executing: /opt/aws_neuronx_venv_pytorch_2_7_nxd_inference/lib/python3.10/site-packages/neuronxcc/starfish/bin/hlo-neff-wrapper --hlo /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.hlo_module.pb --neff /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/model.MODULE_b3ddbc97e5f0d1d64c82+155de413.neff --io_transposes /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/io_transposes.json --output /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/wrapped_neff.hlo --netlist /home/ubuntu/qwen3/context_encoding_model/_tp0_bk3/neuronxcc-dfnjq5y6/hlo_netlist.json +2025-08-07T13:54:40Z INFO 48502 [job.NeffWrapper.0]: There are no io transposes nor zero-sized parameters. Output will not be produced. +Hlo neff wrapper finished successfully. Have a wonderful day :D + +2025-08-07T13:54:40Z INFO 48502 [job.NeffWrapper.0]: Job #0 finished +2025-08-07T13:54:40Z INFO 48502 [pipeline.Pipeline.0]: Finished job job.NeffWrapper.0 +2025-08-07T13:54:40Z INFO 48502 [pipeline.Pipeline.0]: Finished pipeline Pipeline +2025-08-07T13:54:40Z INFO 48502 [pipeline.Pipeline.0]: Job #0 finished +2025-08-07T13:54:40Z INFO 47918 [root]: Subcommand returned with exitcode=0