https://niconegoto.hatenadiary.jp/entry/2017/03/29/222756

3/20~3/28のコミットから取り上げています。中国や京都に旅行していたらコミットが溜まってしまっていました…

cmd/compile: don’t merge load+op if other op arg is still live

#19595に関連する変更です。通常であればload と op を

    l = LOAD ptr mem
    y = OP x l

into

    y = OPload x ptr mem

と、1つの命令に統合したいのですが、すべてのOPload命令ではyはxと同じレジスタを使用する必要があります。 xがこの命令の後に必要な場合は、xを他の場所にコピーしなければならず、命令を最初に統合する利点が失われます。そのため、他のop引数がまだ生きていれば、上記の最適化を無効にしてload とopを統合しないようにしています。

Commit Message

cmd/compile: don't merge load+op if other op arg is still live
We want to merge a load and op into a single instruction

    l = LOAD ptr mem
    y = OP x l

into

    y = OPload x ptr mem

However, all of our OPload instructions require that y uses
the same register as x. If x is needed past this instruction, then
we must copy x somewhere else, losing the whole benefit of merging
the instructions in the first place.

Disable this optimization if x is live past the OP.

Also disable this optimization if the OP is in a deeper loop than the load.

Update #19595

Change-Id: I87f596aad7e91c9127bfb4705cbae47106e1e77a
Reviewed-on: https://go-review.googlesource.com/38337
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>

net/rpc: Create empty maps and slices as return type

#19588に対する修正です。

package main

import (
    "fmt"
    "log"
    "net"
    "net/rpc"
    "net/rpc/jsonrpc"
)

type Arith int

type Result int

func (t *Arith) Multiply(args int, result *Result) error {
    return nil
}

func (t *Arith) DoMap(args int, result *map[string]string) error {
    *result = map[string]string{}
    return nil
}

func (t *Arith) DoMap2(args int, result *map[string]string) error {
    return nil
}

func main() {
    go runServer()
    client, err := jsonrpc.Dial("tcp", fmt.Sprintf("127.0.0.1:9001"))
    if err != nil {
        log.Fatal(err.Error())
    }

    var reply Result
    err = client.Call("Arith.Multiply", 1, &reply)
    if err != nil {
        log.Fatal(err.Error())
    }
    fmt.Println("call Multiply success")
    var ret map[string]string
    err = client.Call("Arith.DoMap", 1, &ret)
    if err != nil {
        log.Fatal(err.Error())
    }
    fmt.Println("call DoMap success")
    err = client.Call("Arith.DoMap2", 1, &ret)
    if err != nil {
        log.Fatal(err.Error())
    }
    fmt.Println("call DoMap2 success")
}

func runServer() {
    arith := new(Arith)
    rpc.Register(arith)
    tcpAddr, err := net.ResolveTCPAddr("tcp", "127.0.0.1:9001")
    if err != nil {
        fmt.Println("ResolveTCPAddr error:", tcpAddr, err.Error())
        return
    }

    listener, err := net.ListenTCP("tcp", tcpAddr)
    if err != nil {
        fmt.Println("listen tcp error:", tcpAddr, err.Error())
        return
    }

    for {
        conn, err := listener.Accept()
        if err != nil {
            continue
        }
        jsonrpc.ServeConn(conn)
    }
}

以前はこのコードに対してinvalid error <nil>が返ってしまっていました。

原因として、mapまたはsliceを戻り値の型として使用する場合に、server.readRequest で返すreflect.Value がnilで返っていたため、このCLによって空の値を作成して返すようにすることで解決されました。

Commit Message

net/rpc: Create empty maps and slices as return type
When a map or slice is used as a return type create an empty value
rather than a nil value.

Fixes #19588

Change-Id: I577fd74956172329745d614ac37d4db8f737efb8
Reviewed-on: https://go-review.googlesource.com/38474
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

os: parse command line without shell32.dll

#15588に対する修正です。

Goではコマンドラインのパラメータをパースするのにshell32.dll からCommandLineToArgVを使用していますがshell32.dllは読み込みが遅いため、Windows向けにshell32.dllの使用を避けたコマンドライン解析を実装し、Goプログラムの起動を速くしています。

ベンチマークは以下の通りです

on my Windows 7 amd64:

name	old time/op	new time/op	delta
RunningGoProgram-2	11.2ms ± 1%	10.4ms ± 2% -6.63%	(p=0.000 n=9+10)

on my Windows XP 386:

name	old time/op	new time/op	delta
RunningGoProgram-2	19.0ms ± 3%	12.1ms ± 1% -36.20%	(p=0.000 n=10+10)

on @egonelbre Windows 10 amd64:

name	old time/op	new time/op	delta
RunningGoProgram-8	17.0ms ± 1%	15.3ms ± 2% -9.71%	(p=0.000 n=10+10)

Commit Message

os: parse command line without shell32.dll
Go uses CommandLineToArgV from shell32.dll to parse command
line parameters. But shell32.dll is slow to load. Implement
Windows command line parsing in Go. This should make starting
Go programs faster.

I can see these speed ups for runtime.BenchmarkRunningGoProgram

on my Windows 7 amd64:
name                old time/op  new time/op  delta
RunningGoProgram-2  11.2ms ± 1%  10.4ms ± 2%  -6.63%  (p=0.000 n=9+10)

on my Windows XP 386:=,.
name                old time/op  new time/op  delta
RunningGoProgram-2  19.0ms ± 3%  12.1ms ± 1%  -36.20%  (p=0.000 n=10+10)

on @egonelbre Windows 10 amd64:
name                old time/op  new time/op  delta
RunningGoProgram-8  17.0ms ± 1%  15.3ms ± 2%  -9.71%  (p=0.000 n=10+10)

This CL is based on CL 22932 by John Starks.

Fixes #15588.

Change-Id: Ib14be0206544d0d4492ca1f0d91fac968be52241
Reviewed-on: https://go-review.googlesource.com/37915
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

net/http: strip port from host in mux Handler

#10463に関連するCLです。

handlerとのマッチングを試みる前にリクエストのhostにportが含まれているかをチェックし、含まれている場合にはmux.Handlerのportを削除してpathをクリーンアップします。 CONNECTリクエストに関しては、pathとhostは変更されずに使用されます。

Commit Message

net/http: strip port from host in mux Handler

This change strips the port in mux.Handler before attempting to
match handlers and adds a test for a request with port.

CONNECT requests continue to use the original path and port.

Fixes #10463

Change-Id: Iff3a2ca2b7f1d884eca05a7262ad6b7dffbcc30f
Reviewed-on: https://go-review.googlesource.com/38194
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>

cmd/compile: disable typPtr caching in the backend

#15756に関連する変更です。このIssueではビルドの高速化に向けて並行コンパイルを行うように多くの変更がされています。今回のCLはその中の一つです。(josharianが他にも多くのCLを出していますが、多すぎで全てを扱うことができませんでした。詳しくはIssueのほうをご覧になってください)

通常、* Tを生成するときには、結果のTypeをTにキャッシュして後で再作成しないようにしますが、そのキャッシュが concurrency-safeではありませんでした。その対応としてこのCLでは mutex を用いるのではなく、処理を開始する前にキャッシングを無効にすることで低コストでのconcurrency-safeを実現しています。また、一般的に使用される* Tsをあらかじめ作成しておくと、新たに* Tsを生成するコストがあまりないため、パフォーマンスの悪化を一層防ぐことができます。

ベンチマークは以下の通りです。

name	old alloc/op	new alloc/op	delta
Template	40.3MB ± 0%	40.4MB ± 0%	+0.18% (p=0.001 n=10+10)
Unicode	29.8MB ± 0%	29.8MB ± 0%	+0.11% (p=0.043 n=10+9)
GoTypes	114MB ± 0%	115MB ± 0%	+0.33% (p=0.000 n=9+10)
SSA	855MB ± 0%	859MB ± 0%	+0.40% (p=0.000 n=10+10)
Flate	25.7MB ± 0%	25.8MB ± 0%	+0.35% (p=0.000 n=10+10)
GoParser	31.9MB ± 0%	32.1MB ± 0%	+0.58% (p=0.000 n=10+10)
Reflect	79.6MB ± 0%	79.9MB ± 0%	+0.31% (p=0.000 n=10+10)
Tar	26.9MB ± 0%	26.9MB ± 0%	+0.21% (p=0.000 n=10+10)
XML	42.5MB ± 0%	42.7MB ± 0%	+0.52% (p=0.000 n=10+9)

name	old allocs/op	new allocs/op	delta
Template	394k ± 1%	393k ± 0%	~ (p=0.529 n=10+10)
Unicode	319k ± 1%	319k ± 0%	~ (p=0.720 n=10+9)
GoTypes	1.15M ± 0%	1.15M ± 0%	+0.14% (p=0.035 n=10+10)
SSA	7.53M ± 0%	7.56M ± 0%	+0.45% (p=0.000 n=9+10)
Flate	238k ± 0%	238k ± 1%	~ (p=0.579 n=10+10)
GoParser	318k ± 1%	320k ± 1%	+0.64% (p=0.001 n=10+10)
Reflect	1.00M ± 0%	1.00M ± 0%	~ (p=0.393 n=10+10)
Tar	254k ± 0%	254k ± 1%	~ (p=0.075 n=10+10)
XML	395k ± 0%	397k ± 0%	+0.44% (p=0.001 n=10+9)

Commit Message

cmd/compile: disable typPtr caching in the backend
The only new Types that the backend introduces
are pointers to Types generated by the frontend.
Usually, when we generate a *T,
we cache the resulting Type in T,
to avoid recreating it later.
However, that caching is not concurrency safe.
Rather than add mutexes, this CL disables that
caching before starting the backend.
The backend generates few enough new *Ts that the
performance impact of this is small, particularly
if we pre-create some commonly used *Ts.

Updates #15756

name       old alloc/op    new alloc/op    delta
Template      40.3MB ± 0%     40.4MB ± 0%  +0.18%  (p=0.001 n=10+10)
Unicode       29.8MB ± 0%     29.8MB ± 0%  +0.11%  (p=0.043 n=10+9)
GoTypes        114MB ± 0%      115MB ± 0%  +0.33%  (p=0.000 n=9+10)
SSA            855MB ± 0%      859MB ± 0%  +0.40%  (p=0.000 n=10+10)
Flate         25.7MB ± 0%     25.8MB ± 0%  +0.35%  (p=0.000 n=10+10)
GoParser      31.9MB ± 0%     32.1MB ± 0%  +0.58%  (p=0.000 n=10+10)
Reflect       79.6MB ± 0%     79.9MB ± 0%  +0.31%  (p=0.000 n=10+10)
Tar           26.9MB ± 0%     26.9MB ± 0%  +0.21%  (p=0.000 n=10+10)
XML           42.5MB ± 0%     42.7MB ± 0%  +0.52%  (p=0.000 n=10+9)

name       old allocs/op   new allocs/op   delta
Template        394k ± 1%       393k ± 0%    ~     (p=0.529 n=10+10)
Unicode         319k ± 1%       319k ± 0%    ~     (p=0.720 n=10+9)
GoTypes        1.15M ± 0%      1.15M ± 0%  +0.14%  (p=0.035 n=10+10)
SSA            7.53M ± 0%      7.56M ± 0%  +0.45%  (p=0.000 n=9+10)
Flate           238k ± 0%       238k ± 1%    ~     (p=0.579 n=10+10)
GoParser        318k ± 1%       320k ± 1%  +0.64%  (p=0.001 n=10+10)
Reflect        1.00M ± 0%      1.00M ± 0%    ~     (p=0.393 n=10+10)
Tar             254k ± 0%       254k ± 1%    ~     (p=0.075 n=10+10)
XML             395k ± 0%       397k ± 0%  +0.44%  (p=0.001 n=10+9)

Change-Id: I6c031ed4f39108f26969c5712b73aa2fc08cd10a
Reviewed-on: https://go-review.googlesource.com/38417
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

runtime: introduce a type for lfstacks

リファクタのためにlfstacks型を導入しています。 lfstackはロックフリーなstackの先頭を表すuint64であり、lfstack のゼロ値は空のリストです。nodeは最初のフィールドとしてlfnodeを埋め込む必要があります。stackはGC可視のポインタをnodeに保持しないので、呼び出し元はnodeがGCされないようにする必要があります。（通常は手動で管理されるメモリから割り当てます） push、pop、およびemptyのメソッドを持つlfstack型を作成することで、CLで書かれているコードのようなGoらしいコードを書くことができます。

Commit Message

runtime: introduce a type for lfstacks
The lfstack API is still a C-style API: lfstacks all have unhelpful
type uint64 and the APIs are package-level functions. Make the code
more readable and Go-style by creating an lfstack type with methods
for push, pop, and empty.

Change-Id: I64685fa3be0e82ae2d1a782a452a50974440a827
Reviewed-on: https://go-review.googlesource.com/38290
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

runtime: disallow malloc or panic in scavenge

scavenge でのMallocsとパニックはmheap.lock 上でセルフデッドロックする可能性を孕んでいます。そのため、このCLではヒープロック状態でのmallocやpanicを禁止しています。

Commit Message

runtime: disallow malloc or panic in scavenge
Mallocs and panics in the scavenge path are particularly nasty because
they're likely to silently self-deadlock on the mheap.lock. Avoid
sinking lots of time into debugging these issues in the future by
turning these into immediate throws.

Change-Id: Ib36fdda33bc90b21c32432b03561630c1f3c69bc
Reviewed-on: https://go-review.googlesource.com/38293
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>

cmd/compile/internal/syntax: add position info for { and } braces

{}括弧の位置情報を追加しています。 syntax.Nodesのメモリ使用量が約1.9％増加しますが、コンパイラ全体のメモリ使用量から見ると無視できる程度です。

Commit Message

cmd/compile/internal/syntax: add position info for { and } braces
This change adds position information for { and } braces in the
source. There's a 1.9% increase in memory use for syntax.Nodes,
which is negligible relative to overall compiler memory consumption.

Parsing the std library (using syntax package only) and memory
consumption before this change (fastest of 5 runs):

  $ go test -run StdLib -fast
  parsed 1516827 lines (3392 files) in 780.612335ms (1943124 lines/s)
  allocated 379.903Mb (486.673Mb/s)

After this change (fastest of 5 runs):

  $ go test -run StdLib -fast
  parsed 1517022 lines (3394 files) in 793.487886ms (1911840 lines/s)
  allocated 387.086Mb (267B/line, 487.828Mb/s)

While not an exact apples-to-apples comparison (the syntax package
has changed and is also parsed), the overall impact is small.

Also: Small improvements to nodes_test.go.

Change-Id: Ib8a7f90bbe79de33d83684e33b1bf8dbc32e644a
Reviewed-on: https://go-review.googlesource.com/38435
Reviewed-by: Matthew Dempsky <mdempsky@google.com>

strconv: optimize decimal ints formatting with smallsString

#19445に関する変更です。

smallsStringを使用して10進整数をフォーマットするように変更されています。以前のCLで1~99の小さなdecimal intsについてキャッシュを用いて高速化を図っていたものの続きです。

GOARCH = amd64でのベンチマーク結果は以下の通りです。

name	old time/op	new time/op	delta
FormatInt-4	2.51µs ± 2%	2.40µs ± 2%	-4.51% (p=0.000 n=9+10)
AppendInt-4	1.67µs ± 2%	1.61µs ± 3%	-3.74% (p=0.000 n=9+9)
FormatUint-4	698ns ± 2%	643ns ± 3%	-7.95% (p=0.000 n=10+8)
AppendUint-4	478ns ± 1%	418ns ± 2%	-12.61% (p=0.000 n=8+10)
AppendUintVarlen/1-4	9.30ns ± 6%	9.15ns ± 1%	~ (p=0.199 n=9+10)
AppendUintVarlen/12-4	9.12ns ± 0%	9.16ns ± 2%	~ (p=0.307 n=9+9)
AppendUintVarlen/123-4	18.6ns ± 2%	18.7ns ± 0%	~ (p=0.091 n=10+6)
AppendUintVarlen/1234-4	19.1ns ± 4%	17.7ns ± 1%	-7.35% (p=0.000 n=10+9)
AppendUintVarlen/12345-4	21.5ns ± 3%	20.7ns ± 3%	-3.78% (p=0.002 n=9+10)
AppendUintVarlen/123456-4	23.5ns ± 3%	20.9ns ± 1%	-11.14% (p=0.000 n=10+9)
AppendUintVarlen/1234567-4	25.0ns ± 2%	23.6ns ± 7%	-5.48% (p=0.004 n=9+10)
AppendUintVarlen/12345678-4	26.8ns ± 2%	23.4ns ± 2%	-12.79% (p=0.000 n=9+10)
AppendUintVarlen/123456789-4	29.8ns ± 3%	26.5ns ± 5%	-11.03% (p=0.000 n=10+10)
AppendUintVarlen/1234567890-4	31.6ns ± 3%	26.9ns ± 3%	-14.95% (p=0.000 n=10+9)
AppendUintVarlen/12345678901-4	33.8ns ± 3%	29.3ns ± 5%	-13.21% (p=0.000 n=10+10)
AppendUintVarlen/123456789012-4	35.5ns ± 4%	29.2ns ± 4%	-17.82% (p=0.000 n=10+10)
AppendUintVarlen/1234567890123-4	37.6ns ± 4%	31.4ns ± 3%	-16.48% (p=0.000 n=10+10)
AppendUintVarlen/12345678901234-4	39.8ns ± 6%	32.0ns ± 7%	-19.60% (p=0.000 n=10+10)
AppendUintVarlen/123456789012345-4	40.7ns ± 0%	34.4ns ± 4%	-15.55% (p=0.000 n=6+10)
AppendUintVarlen/1234567890123456-4	45.4ns ± 6%	35.1ns ± 4%	-22.66% (p=0.000 n=10+10)
AppendUintVarlen/12345678901234567-4	45.1ns ± 1%	36.7ns ± 4%	-18.77% (p=0.000 n=9+10)
AppendUintVarlen/123456789012345678-4	46.9ns ± 0%	36.4ns ± 3%	-22.49% (p=0.000 n=9+10)
AppendUintVarlen/1234567890123456789-4	50.6ns ± 6%	38.8ns ± 3%	-23.28% (p=0.000 n=10+10)
AppendUintVarlen/12345678901234567890-4	51.3ns ± 2%	38.4ns ± 0%	-25.00% (p=0.000 n=9+8)

Commit Message

strconv: optimize decimal ints formatting with smallsString

Benchmark results for GOARCH=amd64:

name                                     old time/op  new time/op  delta
FormatInt-4                              2.51µs ± 2%  2.40µs ± 2%   -4.51%  (p=0.000 n=9+10)
AppendInt-4                              1.67µs ± 2%  1.61µs ± 3%   -3.74%  (p=0.000 n=9+9)
FormatUint-4                              698ns ± 2%   643ns ± 3%   -7.95%  (p=0.000 n=10+8)
AppendUint-4                              478ns ± 1%   418ns ± 2%  -12.61%  (p=0.000 n=8+10)
AppendUintVarlen/1-4                     9.30ns ± 6%  9.15ns ± 1%     ~     (p=0.199 n=9+10)
AppendUintVarlen/12-4                    9.12ns ± 0%  9.16ns ± 2%     ~     (p=0.307 n=9+9)
AppendUintVarlen/123-4                   18.6ns ± 2%  18.7ns ± 0%     ~     (p=0.091 n=10+6)
AppendUintVarlen/1234-4                  19.1ns ± 4%  17.7ns ± 1%   -7.35%  (p=0.000 n=10+9)
AppendUintVarlen/12345-4                 21.5ns ± 3%  20.7ns ± 3%   -3.78%  (p=0.002 n=9+10)
AppendUintVarlen/123456-4                23.5ns ± 3%  20.9ns ± 1%  -11.14%  (p=0.000 n=10+9)
AppendUintVarlen/1234567-4               25.0ns ± 2%  23.6ns ± 7%   -5.48%  (p=0.004 n=9+10)
AppendUintVarlen/12345678-4              26.8ns ± 2%  23.4ns ± 2%  -12.79%  (p=0.000 n=9+10)
AppendUintVarlen/123456789-4             29.8ns ± 3%  26.5ns ± 5%  -11.03%  (p=0.000 n=10+10)
AppendUintVarlen/1234567890-4            31.6ns ± 3%  26.9ns ± 3%  -14.95%  (p=0.000 n=10+9)
AppendUintVarlen/12345678901-4           33.8ns ± 3%  29.3ns ± 5%  -13.21%  (p=0.000 n=10+10)
AppendUintVarlen/123456789012-4          35.5ns ± 4%  29.2ns ± 4%  -17.82%  (p=0.000 n=10+10)
AppendUintVarlen/1234567890123-4         37.6ns ± 4%  31.4ns ± 3%  -16.48%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234-4        39.8ns ± 6%  32.0ns ± 7%  -19.60%  (p=0.000 n=10+10)
AppendUintVarlen/123456789012345-4       40.7ns ± 0%  34.4ns ± 4%  -15.55%  (p=0.000 n=6+10)
AppendUintVarlen/1234567890123456-4      45.4ns ± 6%  35.1ns ± 4%  -22.66%  (p=0.000 n=10+10)
AppendUintVarlen/12345678901234567-4     45.1ns ± 1%  36.7ns ± 4%  -18.77%  (p=0.000 n=9+10)

regexp: reduce allocs in regexp.Match for onepass regex

#19573に対するCLです。(はやぶささん！！)

onepass な正規表現に対するregexp.Match内のアロケーションを削減しています。ベンチマークを見てもわかる通りかなりの速度改善になっています。

regexp.Matchのncap = 0としてあるため、onepassでない正規表現においてはregexp.Matchの割り当てはありませんが、onepassな正規表現の場合、ncap = 0であってもm.matchcapの長さはそのままであるため無駄なアロケーションが発生してしまっていました。

benchmark	old ns/op	new ns/op	delta
BenchmarkMatch_onepass_regex/32-4	6465	4628	-28.41%
BenchmarkMatch_onepass_regex/1K-4	208324	151558	-27.25%
BenchmarkMatch_onepass_regex/32K-4	7230259	5834492	-19.30%
BenchmarkMatch_onepass_regex/1M-4	234379810	166310682	-29.04%
BenchmarkMatch_onepass_regex/32M-4	7903529363	4981119950	-36.98%

benchmark	old MB/s	new MB/s	speedup
BenchmarkMatch_onepass_regex/32-4	4.95	6.91	1.40x
BenchmarkMatch_onepass_regex/1K-4	4.92	6.76	1.37x
BenchmarkMatch_onepass_regex/32K-4	4.53	5.62	1.24x
BenchmarkMatch_onepass_regex/1M-4	4.47	6.30	1.41x
BenchmarkMatch_onepass_regex/32M-4	4.25	6.74	1.59x

Commit Message

regexp: reduce allocs in regexp.Match for onepass regex
There were no allocations in regexp.Match for *non* onepass regex
because m.matchcap length is reset to zero (ncap=0 for regexp.Match).

But, as for onepass regex, m.matchcap length remains as it is even when
ncap=0 and it leads needless allocations.

benchmark                                    old ns/op      new ns/op      delta
BenchmarkMatch_onepass_regex/32-4      6465           4628           -28.41%
BenchmarkMatch_onepass_regex/1K-4      208324         151558         -27.25%
BenchmarkMatch_onepass_regex/32K-4     7230259        5834492        -19.30%
BenchmarkMatch_onepass_regex/1M-4      234379810      166310682      -29.04%
BenchmarkMatch_onepass_regex/32M-4     7903529363     4981119950     -36.98%

benchmark                                    old MB/s     new MB/s     speedup
BenchmarkMatch_onepass_regex/32-4      4.95         6.91         1.40x
BenchmarkMatch_onepass_regex/1K-4      4.92         6.76         1.37x
BenchmarkMatch_onepass_regex/32K-4     4.53         5.62         1.24x
BenchmarkMatch_onepass_regex/1M-4      4.47         6.30         1.41x
BenchmarkMatch_onepass_regex/32M-4     4.25         6.74         1.59x

benchmark                                    old allocs     new allocs     delta
BenchmarkMatch_onepass_regex/32-4      32             0              -100.00%
BenchmarkMatch_onepass_regex/1K-4      1024           0              -100.00%
BenchmarkMatch_onepass_regex/32K-4     32768          0              -100.00%
BenchmarkMatch_onepass_regex/1M-4      1048576        0              -100.00%
BenchmarkMatch_onepass_regex/32M-4     104559255      0              -100.00%

benchmark                                    old bytes      new bytes     delta
BenchmarkMatch_onepass_regex/32-4      512            0             -100.00%
BenchmarkMatch_onepass_regex/1K-4      16384          0             -100.00%
BenchmarkMatch_onepass_regex/32K-4     524288         0             -100.00%
BenchmarkMatch_onepass_regex/1M-4      16777216       0             -100.00%
BenchmarkMatch_onepass_regex/32M-4     2019458128     0             -100.00%

Fixes #19573

Change-Id: I033982d0003ebb0360bb40b92eb3941c781ec74d
Reviewed-on: https://go-review.googlesource.com/38270
Run-TryBot: Michael Matloob <matloob@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>