wjr-z opened issue #7244:
At present, there seem to be serious issues with the epoch mechanism and register usage。
For example, the following is a simple comparison of native and epoch assemblies for a double loopwasmtiem release-13.0.0
native :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 2a 00 00 00 ja 3e <wasm[0]::function[0]+0x3e> 14: 31 c9 xor %ecx,%ecx 16: 45 31 c9 xor %r9d,%r9d 19: 41 83 c1 01 add $0x1,%r9d 1d: 41 81 f9 00 12 7a 00 cmp $0x7a1200,%r9d 24: 0f 8c ef ff ff ff jl 19 <wasm[0]::function[0]+0x19> 2a: 83 c1 01 add $0x1,%ecx 2d: 81 f9 40 9c 00 00 cmp $0x9c40,%ecx 33: 0f 8c dd ff ff ff jl 16 <wasm[0]::function[0]+0x16> 39: 48 89 ec mov %rbp,%rsp 3c: 5d pop %rbp 3d: c3 retq 3e: 0f 0b ud2
epoch :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 04 01 00 00 ja 118 <wasm[0]::function[0]+0x118> 14: 48 83 ec 20 sub $0x20,%rsp 18: 48 89 1c 24 mov %rbx,(%rsp) 1c: 4c 89 6c 24 08 mov %r13,0x8(%rsp) 21: 4c 89 7c 24 10 mov %r15,0x10(%rsp) 26: 48 8b 77 08 mov 0x8(%rdi),%rsi 2a: 4c 8b 4e 10 mov 0x10(%rsi),%r9 2e: 4c 8b 6f 18 mov 0x18(%rdi),%r13 32: 4d 8b 55 00 mov 0x0(%r13),%r10 36: 4d 39 ca cmp %r9,%r10 39: 0f 83 56 00 00 00 jae 95 <wasm[0]::function[0]+0x95> 3f: 45 31 ff xor %r15d,%r15d 42: 4d 8b 55 00 mov 0x0(%r13),%r10 46: 4d 39 ca cmp %r9,%r10 49: 0f 83 6f 00 00 00 jae be <wasm[0]::function[0]+0xbe> 4f: 31 db xor %ebx,%ebx 51: 4d 8b 5d 00 mov 0x0(%r13),%r11 55: 4d 39 cb cmp %r9,%r11 58: 0f 83 8d 00 00 00 jae eb <wasm[0]::function[0]+0xeb> 5e: 83 c3 01 add $0x1,%ebx 61: 81 fb 00 12 7a 00 cmp $0x7a1200,%ebx 67: 0f 8c e4 ff ff ff jl 51 <wasm[0]::function[0]+0x51> 6d: 41 83 c7 01 add $0x1,%r15d 71: 41 81 ff 40 9c 00 00 cmp $0x9c40,%r15d 78: 0f 8c c4 ff ff ff jl 42 <wasm[0]::function[0]+0x42> 7e: 48 8b 1c 24 mov (%rsp),%rbx 82: 4c 8b 6c 24 08 mov 0x8(%rsp),%r13 87: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15 8c: 48 83 c4 20 add $0x20,%rsp 90: 48 89 ec mov %rbp,%rsp 93: 5d pop %rbp 94: c3 retq 95: 4d 39 ca cmp %r9,%r10 98: 0f 82 a1 ff ff ff jb 3f <wasm[0]::function[0]+0x3f> 9e: 48 8b 47 38 mov 0x38(%rdi),%rax a2: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax a9: 48 83 ec 20 sub $0x20,%rsp ad: 48 89 f9 mov %rdi,%rcx b0: ff d0 callq *%rax b2: 48 83 c4 20 add $0x20,%rsp b6: 49 89 c1 mov %rax,%r9 b9: e9 81 ff ff ff jmpq 3f <wasm[0]::function[0]+0x3f> be: 4c 8b 4e 10 mov 0x10(%rsi),%r9 c2: 4d 39 ca cmp %r9,%r10 c5: 0f 82 84 ff ff ff jb 4f <wasm[0]::function[0]+0x4f> cb: 48 8b 47 38 mov 0x38(%rdi),%rax cf: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax d6: 48 83 ec 20 sub $0x20,%rsp da: 48 89 f9 mov %rdi,%rcx dd: ff d0 callq *%rax df: 48 83 c4 20 add $0x20,%rsp e3: 49 89 c1 mov %rax,%r9 e6: e9 64 ff ff ff jmpq 4f <wasm[0]::function[0]+0x4f> eb: 4c 8b 4e 10 mov 0x10(%rsi),%r9 ef: 4d 39 cb cmp %r9,%r11 f2: 0f 82 66 ff ff ff jb 5e <wasm[0]::function[0]+0x5e> f8: 48 8b 4f 38 mov 0x38(%rdi),%rcx fc: 48 8b 91 b0 00 00 00 mov 0xb0(%rcx),%rdx 103: 48 83 ec 20 sub $0x20,%rsp 107: 48 89 f9 mov %rdi,%rcx 10a: ff d2 callq *%rdx 10c: 48 83 c4 20 add $0x20,%rsp 110: 49 89 c1 mov %rax,%r9 113: e9 46 ff ff ff jmpq 5e <wasm[0]::function[0]+0x5e> 118: 0f 0b ud2
The above example assigns some registers, such as ax and cx, to the check block of epoch. And bx has not been allocated for use at all. Actually, this is just a simple example, and more complex workloads have a significant performance impact on the box_seal.wasm, the cost has reached 25%! And after trying to manually fix the issue with epoch (non portable), the cost was only less than 7%.
wjr-z edited issue #7244:
At present, there seem to be serious issues with the epoch mechanism and register usage。
For example, the following is a simple comparison of native and epoch assemblies for a double loopwasmtiem release-13.0.0
native :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 2a 00 00 00 ja 3e <wasm[0]::function[0]+0x3e> 14: 31 c9 xor %ecx,%ecx 16: 45 31 c9 xor %r9d,%r9d 19: 41 83 c1 01 add $0x1,%r9d 1d: 41 81 f9 00 12 7a 00 cmp $0x7a1200,%r9d 24: 0f 8c ef ff ff ff jl 19 <wasm[0]::function[0]+0x19> 2a: 83 c1 01 add $0x1,%ecx 2d: 81 f9 40 9c 00 00 cmp $0x9c40,%ecx 33: 0f 8c dd ff ff ff jl 16 <wasm[0]::function[0]+0x16> 39: 48 89 ec mov %rbp,%rsp 3c: 5d pop %rbp 3d: c3 retq 3e: 0f 0b ud2
epoch :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 04 01 00 00 ja 118 <wasm[0]::function[0]+0x118> 14: 48 83 ec 20 sub $0x20,%rsp 18: 48 89 1c 24 mov %rbx,(%rsp) 1c: 4c 89 6c 24 08 mov %r13,0x8(%rsp) 21: 4c 89 7c 24 10 mov %r15,0x10(%rsp) 26: 48 8b 77 08 mov 0x8(%rdi),%rsi 2a: 4c 8b 4e 10 mov 0x10(%rsi),%r9 2e: 4c 8b 6f 18 mov 0x18(%rdi),%r13 32: 4d 8b 55 00 mov 0x0(%r13),%r10 36: 4d 39 ca cmp %r9,%r10 39: 0f 83 56 00 00 00 jae 95 <wasm[0]::function[0]+0x95> 3f: 45 31 ff xor %r15d,%r15d 42: 4d 8b 55 00 mov 0x0(%r13),%r10 46: 4d 39 ca cmp %r9,%r10 49: 0f 83 6f 00 00 00 jae be <wasm[0]::function[0]+0xbe> 4f: 31 db xor %ebx,%ebx 51: 4d 8b 5d 00 mov 0x0(%r13),%r11 55: 4d 39 cb cmp %r9,%r11 58: 0f 83 8d 00 00 00 jae eb <wasm[0]::function[0]+0xeb> 5e: 83 c3 01 add $0x1,%ebx 61: 81 fb 00 12 7a 00 cmp $0x7a1200,%ebx 67: 0f 8c e4 ff ff ff jl 51 <wasm[0]::function[0]+0x51> 6d: 41 83 c7 01 add $0x1,%r15d 71: 41 81 ff 40 9c 00 00 cmp $0x9c40,%r15d 78: 0f 8c c4 ff ff ff jl 42 <wasm[0]::function[0]+0x42> 7e: 48 8b 1c 24 mov (%rsp),%rbx 82: 4c 8b 6c 24 08 mov 0x8(%rsp),%r13 87: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15 8c: 48 83 c4 20 add $0x20,%rsp 90: 48 89 ec mov %rbp,%rsp 93: 5d pop %rbp 94: c3 retq 95: 4d 39 ca cmp %r9,%r10 98: 0f 82 a1 ff ff ff jb 3f <wasm[0]::function[0]+0x3f> 9e: 48 8b 47 38 mov 0x38(%rdi),%rax a2: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax a9: 48 83 ec 20 sub $0x20,%rsp ad: 48 89 f9 mov %rdi,%rcx b0: ff d0 callq *%rax b2: 48 83 c4 20 add $0x20,%rsp b6: 49 89 c1 mov %rax,%r9 b9: e9 81 ff ff ff jmpq 3f <wasm[0]::function[0]+0x3f> be: 4c 8b 4e 10 mov 0x10(%rsi),%r9 c2: 4d 39 ca cmp %r9,%r10 c5: 0f 82 84 ff ff ff jb 4f <wasm[0]::function[0]+0x4f> cb: 48 8b 47 38 mov 0x38(%rdi),%rax cf: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax d6: 48 83 ec 20 sub $0x20,%rsp da: 48 89 f9 mov %rdi,%rcx dd: ff d0 callq *%rax df: 48 83 c4 20 add $0x20,%rsp e3: 49 89 c1 mov %rax,%r9 e6: e9 64 ff ff ff jmpq 4f <wasm[0]::function[0]+0x4f> eb: 4c 8b 4e 10 mov 0x10(%rsi),%r9 ef: 4d 39 cb cmp %r9,%r11 f2: 0f 82 66 ff ff ff jb 5e <wasm[0]::function[0]+0x5e> f8: 48 8b 4f 38 mov 0x38(%rdi),%rcx fc: 48 8b 91 b0 00 00 00 mov 0xb0(%rcx),%rdx 103: 48 83 ec 20 sub $0x20,%rsp 107: 48 89 f9 mov %rdi,%rcx 10a: ff d2 callq *%rdx 10c: 48 83 c4 20 add $0x20,%rsp 110: 49 89 c1 mov %rax,%r9 113: e9 46 ff ff ff jmpq 5e <wasm[0]::function[0]+0x5e> 118: 0f 0b ud2
The above example assigns some registers, such as ax and cx, to the check block of epoch. Actually, this is just a simple example, and more complex workloads have a significant performance impact on the box_seal.wasm, the cost has reached 25%! And after trying to manually fix the issue with epoch (non portable), the cost was only less than 7%.
wjr-z edited issue #7244:
At present, there seem to be serious issues with the epoch mechanism and register usage。
For example, the following is a simple comparison of native and epoch assemblies for a double loopwasmtiem release-13.0.0
native :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 2a 00 00 00 ja 3e <wasm[0]::function[0]+0x3e> 14: 31 c9 xor %ecx,%ecx 16: 45 31 c9 xor %r9d,%r9d 19: 41 83 c1 01 add $0x1,%r9d 1d: 41 81 f9 00 12 7a 00 cmp $0x7a1200,%r9d 24: 0f 8c ef ff ff ff jl 19 <wasm[0]::function[0]+0x19> 2a: 83 c1 01 add $0x1,%ecx 2d: 81 f9 40 9c 00 00 cmp $0x9c40,%ecx 33: 0f 8c dd ff ff ff jl 16 <wasm[0]::function[0]+0x16> 39: 48 89 ec mov %rbp,%rsp 3c: 5d pop %rbp 3d: c3 retq 3e: 0f 0b ud2
epoch :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 04 01 00 00 ja 118 <wasm[0]::function[0]+0x118> 14: 48 83 ec 20 sub $0x20,%rsp 18: 48 89 1c 24 mov %rbx,(%rsp) 1c: 4c 89 6c 24 08 mov %r13,0x8(%rsp) 21: 4c 89 7c 24 10 mov %r15,0x10(%rsp) 26: 48 8b 77 08 mov 0x8(%rdi),%rsi 2a: 4c 8b 4e 10 mov 0x10(%rsi),%r9 2e: 4c 8b 6f 18 mov 0x18(%rdi),%r13 32: 4d 8b 55 00 mov 0x0(%r13),%r10 36: 4d 39 ca cmp %r9,%r10 39: 0f 83 56 00 00 00 jae 95 <wasm[0]::function[0]+0x95> 3f: 45 31 ff xor %r15d,%r15d 42: 4d 8b 55 00 mov 0x0(%r13),%r10 46: 4d 39 ca cmp %r9,%r10 49: 0f 83 6f 00 00 00 jae be <wasm[0]::function[0]+0xbe> 4f: 31 db xor %ebx,%ebx 51: 4d 8b 5d 00 mov 0x0(%r13),%r11 55: 4d 39 cb cmp %r9,%r11 58: 0f 83 8d 00 00 00 jae eb <wasm[0]::function[0]+0xeb> 5e: 83 c3 01 add $0x1,%ebx 61: 81 fb 00 12 7a 00 cmp $0x7a1200,%ebx 67: 0f 8c e4 ff ff ff jl 51 <wasm[0]::function[0]+0x51> 6d: 41 83 c7 01 add $0x1,%r15d 71: 41 81 ff 40 9c 00 00 cmp $0x9c40,%r15d 78: 0f 8c c4 ff ff ff jl 42 <wasm[0]::function[0]+0x42> 7e: 48 8b 1c 24 mov (%rsp),%rbx 82: 4c 8b 6c 24 08 mov 0x8(%rsp),%r13 87: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15 8c: 48 83 c4 20 add $0x20,%rsp 90: 48 89 ec mov %rbp,%rsp 93: 5d pop %rbp 94: c3 retq 95: 4d 39 ca cmp %r9,%r10 98: 0f 82 a1 ff ff ff jb 3f <wasm[0]::function[0]+0x3f> 9e: 48 8b 47 38 mov 0x38(%rdi),%rax a2: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax a9: 48 83 ec 20 sub $0x20,%rsp ad: 48 89 f9 mov %rdi,%rcx b0: ff d0 callq *%rax b2: 48 83 c4 20 add $0x20,%rsp b6: 49 89 c1 mov %rax,%r9 b9: e9 81 ff ff ff jmpq 3f <wasm[0]::function[0]+0x3f> be: 4c 8b 4e 10 mov 0x10(%rsi),%r9 c2: 4d 39 ca cmp %r9,%r10 c5: 0f 82 84 ff ff ff jb 4f <wasm[0]::function[0]+0x4f> cb: 48 8b 47 38 mov 0x38(%rdi),%rax cf: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax d6: 48 83 ec 20 sub $0x20,%rsp da: 48 89 f9 mov %rdi,%rcx dd: ff d0 callq *%rax df: 48 83 c4 20 add $0x20,%rsp e3: 49 89 c1 mov %rax,%r9 e6: e9 64 ff ff ff jmpq 4f <wasm[0]::function[0]+0x4f> eb: 4c 8b 4e 10 mov 0x10(%rsi),%r9 ef: 4d 39 cb cmp %r9,%r11 f2: 0f 82 66 ff ff ff jb 5e <wasm[0]::function[0]+0x5e> f8: 48 8b 4f 38 mov 0x38(%rdi),%rcx fc: 48 8b 91 b0 00 00 00 mov 0xb0(%rcx),%rdx 103: 48 83 ec 20 sub $0x20,%rsp 107: 48 89 f9 mov %rdi,%rcx 10a: ff d2 callq *%rdx 10c: 48 83 c4 20 add $0x20,%rsp 110: 49 89 c1 mov %rax,%r9 113: e9 46 ff ff ff jmpq 5e <wasm[0]::function[0]+0x5e> 118: 0f 0b ud2
The above example assigns some registers, such as ax and cx, to the check block of epoch. Actually, this is just a simple example, and more complex workloads have a significant performance impact on the box_seal.wasm, the cost has reached 25%! And after trying to manually fix the issue with epoch (Unstable), the cost was only less than 7%.
wjr-z edited issue #7244:
At present, there seem to be serious issues with the epoch mechanism and register usage。
For example, the following is a simple comparison of native and epoch assemblies for a double loopwasmtiem release-13.0.0
native :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 2a 00 00 00 ja 3e <wasm[0]::function[0]+0x3e> 14: 31 c9 xor %ecx,%ecx 16: 45 31 c9 xor %r9d,%r9d 19: 41 83 c1 01 add $0x1,%r9d 1d: 41 81 f9 00 12 7a 00 cmp $0x7a1200,%r9d 24: 0f 8c ef ff ff ff jl 19 <wasm[0]::function[0]+0x19> 2a: 83 c1 01 add $0x1,%ecx 2d: 81 f9 40 9c 00 00 cmp $0x9c40,%ecx 33: 0f 8c dd ff ff ff jl 16 <wasm[0]::function[0]+0x16> 39: 48 89 ec mov %rbp,%rsp 3c: 5d pop %rbp 3d: c3 retq 3e: 0f 0b ud2
epoch :
0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 4c 8b 57 08 mov 0x8(%rdi),%r10 8: 4d 8b 12 mov (%r10),%r10 b: 49 39 e2 cmp %rsp,%r10 e: 0f 87 04 01 00 00 ja 118 <wasm[0]::function[0]+0x118> 14: 48 83 ec 20 sub $0x20,%rsp 18: 48 89 1c 24 mov %rbx,(%rsp) 1c: 4c 89 6c 24 08 mov %r13,0x8(%rsp) 21: 4c 89 7c 24 10 mov %r15,0x10(%rsp) 26: 48 8b 77 08 mov 0x8(%rdi),%rsi 2a: 4c 8b 4e 10 mov 0x10(%rsi),%r9 2e: 4c 8b 6f 18 mov 0x18(%rdi),%r13 32: 4d 8b 55 00 mov 0x0(%r13),%r10 36: 4d 39 ca cmp %r9,%r10 39: 0f 83 56 00 00 00 jae 95 <wasm[0]::function[0]+0x95> 3f: 45 31 ff xor %r15d,%r15d 42: 4d 8b 55 00 mov 0x0(%r13),%r10 46: 4d 39 ca cmp %r9,%r10 49: 0f 83 6f 00 00 00 jae be <wasm[0]::function[0]+0xbe> 4f: 31 db xor %ebx,%ebx 51: 4d 8b 5d 00 mov 0x0(%r13),%r11 55: 4d 39 cb cmp %r9,%r11 58: 0f 83 8d 00 00 00 jae eb <wasm[0]::function[0]+0xeb> 5e: 83 c3 01 add $0x1,%ebx 61: 81 fb 00 12 7a 00 cmp $0x7a1200,%ebx 67: 0f 8c e4 ff ff ff jl 51 <wasm[0]::function[0]+0x51> 6d: 41 83 c7 01 add $0x1,%r15d 71: 41 81 ff 40 9c 00 00 cmp $0x9c40,%r15d 78: 0f 8c c4 ff ff ff jl 42 <wasm[0]::function[0]+0x42> 7e: 48 8b 1c 24 mov (%rsp),%rbx 82: 4c 8b 6c 24 08 mov 0x8(%rsp),%r13 87: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15 8c: 48 83 c4 20 add $0x20,%rsp 90: 48 89 ec mov %rbp,%rsp 93: 5d pop %rbp 94: c3 retq 95: 4d 39 ca cmp %r9,%r10 98: 0f 82 a1 ff ff ff jb 3f <wasm[0]::function[0]+0x3f> 9e: 48 8b 47 38 mov 0x38(%rdi),%rax a2: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax a9: 48 83 ec 20 sub $0x20,%rsp ad: 48 89 f9 mov %rdi,%rcx b0: ff d0 callq *%rax b2: 48 83 c4 20 add $0x20,%rsp b6: 49 89 c1 mov %rax,%r9 b9: e9 81 ff ff ff jmpq 3f <wasm[0]::function[0]+0x3f> be: 4c 8b 4e 10 mov 0x10(%rsi),%r9 c2: 4d 39 ca cmp %r9,%r10 c5: 0f 82 84 ff ff ff jb 4f <wasm[0]::function[0]+0x4f> cb: 48 8b 47 38 mov 0x38(%rdi),%rax cf: 48 8b 80 b0 00 00 00 mov 0xb0(%rax),%rax d6: 48 83 ec 20 sub $0x20,%rsp da: 48 89 f9 mov %rdi,%rcx dd: ff d0 callq *%rax df: 48 83 c4 20 add $0x20,%rsp e3: 49 89 c1 mov %rax,%r9 e6: e9 64 ff ff ff jmpq 4f <wasm[0]::function[0]+0x4f> eb: 4c 8b 4e 10 mov 0x10(%rsi),%r9 ef: 4d 39 cb cmp %r9,%r11 f2: 0f 82 66 ff ff ff jb 5e <wasm[0]::function[0]+0x5e> f8: 48 8b 4f 38 mov 0x38(%rdi),%rcx fc: 48 8b 91 b0 00 00 00 mov 0xb0(%rcx),%rdx 103: 48 83 ec 20 sub $0x20,%rsp 107: 48 89 f9 mov %rdi,%rcx 10a: ff d2 callq *%rdx 10c: 48 83 c4 20 add $0x20,%rsp 110: 49 89 c1 mov %rax,%r9 113: e9 46 ff ff ff jmpq 5e <wasm[0]::function[0]+0x5e> 118: 0f 0b ud2
The above example assigns some registers, such as ax and cx, to the check block of epoch. Actually, this is just a simple example, and more complex workloads have a significant performance impact on the box_seal.wasm, the cost has reached 25%! And after trying to manually fix the issue with epoch (Unstable), the cost was only less than 7%.
Especially for inner and outer loops, the outer loop uses r10 for storage, but the inner loop uses r11, which I cannot understand
alexcrichton commented on issue #7244:
Thanks for the report! Would you be able to share a wasm file or an example loop in source code to help reproduce this locally?
wjr-z commented on issue #7244:
Thanks for the report! Would you be able to share a wasm file or an example loop in source code to help reproduce this locally?
Thank you for your reply. In fact, I am actively searching for the reason why I am interested in optimization.
This is link to box_seal. wasm https://github.com/jedisct1/webassembly-benchmarks/blob/master/2021-Q1/wasm/box_seal.wasm
Then, this is the code for the example loop.(module (export "_start" (func $_start)) (func $_start (; 0 ;) (local $i i32) (local $i2 i32) i32.const 0 local.set $i loop $loop i32.const 0 local.set $i2 loop $loop2 local.get $i2 i32.const 1 i32.add local.set $i2 local.get $i2 i32.const 80000 i32.lt_s br_if $loop2 end $loop2 local.get $i i32.const 1 i32.add local.set $i local.get $i i32.const 40000 i32.lt_s br_if $loop end $loop ) )
wjr-z edited a comment on issue #7244:
Thanks for the report! Would you be able to share a wasm file or an example loop in source code to help reproduce this locally?
Thank you for your reply. In fact, I am actively searching for the reason .
This is link to box_seal. wasm https://github.com/jedisct1/webassembly-benchmarks/blob/master/2021-Q1/wasm/box_seal.wasm
Then, this is the code for the example loop.(module (export "_start" (func $_start)) (func $_start (; 0 ;) (local $i i32) (local $i2 i32) i32.const 0 local.set $i loop $loop i32.const 0 local.set $i2 loop $loop2 local.get $i2 i32.const 1 i32.add local.set $i2 local.get $i2 i32.const 80000 i32.lt_s br_if $loop2 end $loop2 local.get $i i32.const 1 i32.add local.set $i local.get $i i32.const 40000 i32.lt_s br_if $loop end $loop ) )
alexcrichton commented on issue #7244:
Thanks! Could you detail a bit more what you mean by "manually fix the issue with epoch (Unstable), the cost was only less than 7%"?
Looking at the disassembly it's not obvious to me what the issue is and how such a large win could be gained, so I'm curious how you were able to achieve it!
wjr-z commented on issue #7244:
Thanks! Could you detail a bit more what you mean by "manually fix the issue with epoch (Unstable), the cost was only less than 7%"?
Looking at the disassembly it's not obvious to me what the issue is and how such a large win could be gained, so I'm curious how you were able to achieve it!
Unfortunately, the data on the server was lost. I'll try to reproduce it next week.
Last updated: Dec 23 2024 at 12:05 UTC