在golang源码分析:encoding/json(1)分析完序列化方法后,我们来分析下Unmarshal函数,它的源码位于encoding/json/decode.go,同样,我先看下函数的注释:
1,它的第二个参数v是个interface,如果v时nil或者不是指针,返回 Unmarshal returns an InvalidUnmarshalError.
2,过程中会按需创建 maps, slices, and pointers 并分配内存空间。
3,解析json过程中会按照如下规则来解析:
A,,把 null 转成nil指针,否则把json Unmarshal赋值给这个指针,如果指针本身是nil,那么申请一块内存
B,如果类型实现了Unmarshaler 接口,使用Unmarshaler 接口来反序列化,即使输入为null也会调用方法;如果实现了encoding.TextUnmarshaler 接口,并且输入是带引号的字符串,会调用encoding.TextUnmarshaler方法
C,反序列化结构体的时候,需要匹配json的key和结构体的字段名字或者tag
// To unmarshal JSON into a struct, Unmarshal matches incoming object
// keys to the keys used by Marshal (either the struct field name or its tag),
首选精确匹配,也考虑大小写不敏感匹配,找不到的字段会被忽略。
see Decoder.DisallowUnknownFields for an alternative
D,如果反序列化成接口,会按照下面的规则把值存入接口
// bool, for JSON booleans
// float64, for JSON numbers
// string, for JSON strings
// []interface{}, for JSON arrays
// map[string]interface{}, for JSON objects
// nil for JSON null
E,JSON 协议中没有整型和浮点型的区别,它们统称为 number,如果将 JSON 格式的数据反序列化为 map[string]interface{} 时,数字都变成科学计数法表示的浮点数。如果想更合理的处理数字,需要使用decoder去反序列化,使用json.Number类型
F,反序列化数组到slice,会重置slice的长度为0,然后,依次append元素到slice上,特殊情况:空数组会被替换成一个新的空的slice
G,反序列化数组到数组,如果go的数组比json的数组短,超出部分会被丢弃,反之会填充0值
H,反序列化到map,首先会新建立一个map来用,如果是map空的,新建一个,否则就填充这个map,把键值对填入这个map,键的类型必须满足下列条件
The map's key type must
// either be any string type, an integer, implement json.Unmarshaler, or
// implement encoding.TextUnmarshaler.
4,如果json不合法,会返回SyntaxError
5,如果值的类型不匹配,或者数字的值溢出了,unmarshal会跳过这些字段,会尽可能完成解析。如果没有更严重的错误,会返回UnmarshalTypeError,不保证把所有剩余的字段解析完毕。
6,null值被解析进interface, map, pointer, or slice 类型的时候是nil
7,解析带引号的字符串的时候,不符合规范的字符不会报错会被替换成U+FFFD
func Unmarshal(data []byte, v any) error {
// Check for well-formedness.
// Avoids filling out half a data structure
// before discovering a JSON syntax error.
var d decodeState
err := checkValid(data, &d.scan)
if err != nil {
return err
}
d.init(data)
return d.unmarshal(v)
}
反序列化方法会定义一个解析状态机decodeState对象,然后检查一下json是否合法,最后将json数据传入解析状态机,进行反序列化。解析状态机的定义如下:
type decodeState struct {
data []byte
off int // next read offset in data
opcode int // last read result
scan scanner
errorContext *errorContext
savedError error
useNumber bool
disallowUnknownFields bool
}
它的核心属性时json扫描器scanner,扫描器的核心方法时step方法,它会沿着输入的json串,依次解析出每一个json词法单元,然后赋值给go对象。
type scanner struct {
// The step is a func to be called to execute the next transition.
// Also tried using an integer constant and a single func
// with a switch, but using the func directly was 10% faster
// on a 64-bit Mac Mini, and it's nicer to read.
step func(*scanner, byte) int
// Reached end of top-level value.
endTop bool
// Stack of what we're in the middle of - array values, object keys, object values.
parseState []int
// Error that happened, if any.
err error
// total bytes consumed, updated by decoder.Decode (and deliberately
// not set to zero by scan.reset)
bytes int64
}
检查合法性的时候就用到了step方法,如果到json串结尾没有遇到错误说明就是合法的。
func checkValid(data []byte, scan *scanner) error {
scan.reset()
for _, c := range data {
scan.bytes++
if scan.step(scan, c) == scanError {
return scan.err
}
}
if scan.eof() == scanError {
return scan.err
}
return nil
}
在reset方法里设置了step方法为stateBeginValue
func (s *scanner) reset() {
s.step = stateBeginValue
s.parseState = s.parseState[0:0]
s.err = nil
s.endTop = false
}
可以看到,它识别出第一个字符后,根据第一个字符推断后面的类型,比如{后面是json对象,[后面是数组," 后面是字符串,类似的可以推断后面类型是true,false,null,num等。
func stateBeginValue(s *scanner, c byte) int {
if isSpace(c) {
return scanSkipSpace
}
switch c {
case '{':
s.step = stateBeginStringOrEmpty
return s.pushParseState(c, parseObjectKey, scanBeginObject)
case '[':
s.step = stateBeginValueOrEmpty
return s.pushParseState(c, parseArrayValue, scanBeginArray)
case '"':
s.step = stateInString
return scanBeginLiteral
case '-':
s.step = stateNeg
return scanBeginLiteral
case '0': // beginning of 0.123
s.step = state0
return scanBeginLiteral
case 't': // beginning of true
s.step = stateT
return scanBeginLiteral
case 'f': // beginning of false
s.step = stateF
return scanBeginLiteral
case 'n': // beginning of null
s.step = stateN
return scanBeginLiteral
}
if '1' <= c && c <= '9' { // beginning of 1234.5
s.step = state1
return scanBeginLiteral
}
return s.error(c, "looking for beginning of value")
然后修改step方法为当前语义下,解析后面的单元应该用到的解析函数,比如,对象开始后,后面应该是空对象} ,或者字符串组成的key
// stateBeginStringOrEmpty is the state after reading `{`.
func stateBeginStringOrEmpty(s *scanner, c byte) int {
if isSpace(c) {
return scanSkipSpace
}
if c == '}' {
n := len(s.parseState)
s.parseState[n-1] = parseObjectValue
return stateEndValue(s, c)
}
return stateBeginString(s, c)
}
如果不是空对象,就应该解析对应的字符串key:
// stateBeginString is the state after reading `{"key": value,`.
func stateBeginString(s *scanner, c byte) int {
if isSpace(c) {
return scanSkipSpace
}
if c == '"' {
s.step = stateInString
return scanBeginLiteral
}
return s.error(c, "looking for beginning of object key string")
}
能够解析出的词法单元如下:
const (
// Continue.
scanContinue = iota // uninteresting byte
scanBeginLiteral // end implied by next result != scanContinue
scanBeginObject // begin object
scanObjectKey // just finished object key (string)
scanObjectValue // just finished non-last object value
scanEndObject // end object (implies scanObjectValue if possible)
scanBeginArray // begin array
scanArrayValue // just finished array value
scanEndArray // end array (implies scanArrayValue if possible)
scanSkipSpace // space byte; can skip; known to be last "continue" result
// Stop.
scanEnd // top-level value ended *before* this byte; known to be first "stop" result
scanError // hit an error, scanner.err.
)
解析到合法的词法单元后会放到栈中,做一些词法单元的匹配:
func (s *scanner) pushParseState(c byte, newParseState int, successState int) int {
s.parseState = append(s.parseState, newParseState)
if len(s.parseState) <= maxNestingDepth {
return successState
}
return s.error(c, "exceeded max depth")
}
json检查通过后,把json塞给状态机
func (d *decodeState) init(data []byte) *decodeState {
d.data = data
然后就进入了正式的反序列化过程:
func (d *decodeState) unmarshal(v any) error {
rv := reflect.ValueOf(v)
if rv.Kind() != reflect.Pointer || rv.IsNil() {
return &InvalidUnmarshalError{reflect.TypeOf(v)}
}
d.scan.reset()
d.scanWhile(scanSkipSpace)
err := d.value(rv)
检查是否是指针类型,跳过空格,然后通过反射将扫描到的值赋值给v。扫描的过程和刚刚类型检查的过程完全一样。从前往后扫描,直到非空格才退出
func (d *decodeState) scanWhile(op int) {
s, data, i := &d.scan, d.data, d.off
for i < len(data) {
newOp := s.step(s, data[i])
i++
if newOp != op {
d.opcode = newOp
d.off = i
return
核心函数,赋值函数会根据扫描到的词法单元来赋值,被分为三类:json数组,json对象,和普通的json类型
func (d *decodeState) value(v reflect.Value) error {
switch d.opcode {
default:
panic(phasePanicMsg)
case scanBeginArray:
if v.IsValid() {
if err := d.array(v); err != nil {
return err
}
} else {
d.skip()
}
d.scanNext()
case scanBeginObject:
if v.IsValid() {
if err := d.object(v); err != nil {
return err
}
} else {
d.skip()
}
d.scanNext()
case scanBeginLiteral:
// All bytes inside literal return scanContinue op code.
start := d.readIndex()
d.rescanLiteral()
if v.IsValid() {
if err := d.literalStore(d.data[start:d.readIndex()], v, false); err != nil {
return err
}
}
}
首先看下数组类型是如何赋值的
func (d *decodeState) array(v reflect.Value) error {
// Check for unmarshaler.
u, ut, pv := indirect(v, false)
if u != nil {
start := d.readIndex()
d.skip()
return u.UnmarshalJSON(d.data[start:d.off])
}
if ut != nil {
d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
d.skip()
return nil
}
v = pv
// Check type of target.
switch v.Kind() {
case reflect.Interface:
if v.NumMethod() == 0 {
// Decoding into nil interface? Switch to non-reflect code.
ai := d.arrayInterface()
v.Set(reflect.ValueOf(ai))
return nil
}
// Otherwise it's invalid.
fallthrough
default:
d.saveError(&UnmarshalTypeError{Value: "array", Type: v.Type(), Offset: int64(d.off)})
d.skip()
return nil
case reflect.Array, reflect.Slice:
break
}
i := 0
for {
// Look ahead for ] - can only happen on first iteration.
d.scanWhile(scanSkipSpace)
if d.opcode == scanEndArray {
break
}
// Get element of array, growing if necessary.
if v.Kind() == reflect.Slice {
// Grow slice if necessary
if i >= v.Cap() {
newcap := v.Cap() + v.Cap()/2
if newcap < 4 {
newcap = 4
}
newv := reflect.MakeSlice(v.Type(), v.Len(), newcap)
reflect.Copy(newv, v)
v.Set(newv)
}
if i >= v.Len() {
v.SetLen(i + 1)
}
}
if i < v.Len() {
// Decode into element.
if err := d.value(v.Index(i)); err != nil {
return err
}
} else {
// Ran out of fixed array: skip.
if err := d.value(reflect.Value{}); err != nil {
return err
}
}
i++
// Next token must be , or ].
if d.opcode == scanSkipSpace {
d.scanWhile(scanSkipSpace)
}
if d.opcode == scanEndArray {
break
}
if d.opcode != scanArrayValue {
panic(phasePanicMsg)
}
}
if i < v.Len() {
if v.Kind() == reflect.Array {
// Array. Zero the rest.
z := reflect.Zero(v.Type().Elem())
for ; i < v.Len(); i++ {
v.Index(i).Set(z)
}
} else {
v.SetLen(i)
}
}
if i == 0 && v.Kind() == reflect.Slice {
v.Set(reflect.MakeSlice(v.Type(), 0, 0))
}
return nil
}
它首先通过indirect方法检查类型是否实现了自定义的反序列化方法,如果实现了,调用自定义的反序列化方法UnmarshalJSON。否则根据具体类型递归选择对应的反序列化方法。
func indirect(v reflect.Value, decodingNull bool) (Unmarshaler, encoding.TextUnmarshaler, reflect.Value) {
for {
if v.Type().NumMethod() > 0 && v.CanInterface() {
if u, ok := v.Interface().(Unmarshaler); ok {
return u, nil, reflect.Value{}
}
if !decodingNull {
if u, ok := v.Interface().(encoding.TextUnmarshaler); ok {
return nil, u, reflect.Value{}
}
}
}
如果是无函数的接口类型会调arrayInterface()用来进行json解析,然后通过反射把解析到的值赋值给v
v.Set(reflect.ValueOf(ai))
如果是数组或者slice接着按后面的方法继续解析:
case reflect.Array, reflect.Slice:
break
因为json的数组,只能解析成go的interface,array,slice否则是不合法的。解析arrayInterface的时候会把每一个元素解析成interface
func (d *decodeState) arrayInterface() []any {
v = append(v, d.valueInterface())
解析每个valueInterface的过程其实是递归的。
func (d *decodeState) valueInterface() (val any) {
switch d.opcode {
default:
panic(phasePanicMsg)
case scanBeginArray:
val = d.arrayInterface()
d.scanNext()
case scanBeginObject:
val = d.objectInterface()
d.scanNext()
case scanBeginLiteral:
val = d.literalInterface()
}
return
如果是简单类型,会按照最普通的interface来解析,根据json的手字母依次解析null,bool,数字,字符串等。
func (d *decodeState) literalInterface() any {
start := d.readIndex()
d.rescanLiteral()
item := d.data[start:d.readIndex()]
switch c := item[0]; c {
case 'n': // null
return nil
case 't', 'f': // true, false
return c == 't'
case '"': // string
s, ok := unquote(item)
if !ok {
panic(phasePanicMsg)
}
return s
default: // number
if c != '-' && (c < '0' || c > '9') {
panic(phasePanicMsg)
}
n, err := d.convertNumber(string(item))
if err != nil {
d.saveError(err)
}
return n
}
}
如果是按照array或者slice类型来解析的话,它其实在一个for循环里面,依次解析每一个元素,然后按照slice和array的填充规则来进行填充,如果是slice类型,中间会遇到内存的重新申请。
for {
// Look ahead for ] - can only happen on first iteration.
d.scanWhile(scanSkipSpace)
if d.opcode == scanEndArray {
break
}
if v.Kind() == reflect.Slice {
if i >= v.Cap() {
newcap := v.Cap() + v.Cap()/2
if newcap < 4 {
newcap = 4
}
newv := reflect.MakeSlice(v.Type(), v.Len(), newcap)
reflect.Copy(newv, v)
if i >= v.Len() {
v.SetLen(i + 1)
}
if i < v.Len() {
// Decode into element.
if err := d.value(v.Index(i)); err != nil {
return err
}
} else {
// Ran out of fixed array: skip.
if err := d.value(reflect.Value{}); err != nil {
return err
}
}
if i < v.Len() {
if v.Kind() == reflect.Array {
// Array. Zero the rest.
z := reflect.Zero(v.Type().Elem())
for ; i < v.Len(); i++ {
v.Index(i).Set(z)
}
} else {
v.SetLen(i)
}
}
if i == 0 && v.Kind() == reflect.Slice {
v.Set(reflect.MakeSlice(v.Type(), 0, 0))
}
分析完json数组的解析过程,我们来分析普通json类型的解析过程
func (d *decodeState) rescanLiteral() {
data, i := d.data, d.off
Switch:
switch data[i-1] {
case '"': // string
for ; i < len(data); i++ {
switch data[i] {
case '\\':
i++ // escaped char
case '"':
i++ // tokenize the closing quote too
break Switch
}
}
case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '-': // number
for ; i < len(data); i++ {
switch data[i] {
case '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'.', 'e', 'E', '+', '-':
default:
break Switch
}
}
case 't': // true
i += len("rue")
case 'f': // false
i += len("alse")
case 'n': // null
i += len("ull")
}
if i < len(data) {
d.opcode = stateEndValue(&d.scan, data[i])
} else {
d.opcode = scanEnd
}
d.off = i + 1
}
它就是遍历json字符串,解析出基本类型。解析出基本类型后,如果是合法的,就将它和val绑定
if v.IsValid() {
if err := d.literalStore(d.data[start:d.readIndex()], v, false); err != nil {
绑定的过程,会根据不同类型来进行不同处理
func (d *decodeState) literalStore(item []byte, v reflect.Value, fromQuoted bool) error {
// Check for unmarshaler.
if len(item) == 0 {
//Empty string given
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
return nil
}
isNull := item[0] == 'n' // null
u, ut, pv := indirect(v, isNull)
if u != nil {
return u.UnmarshalJSON(item)
}
if ut != nil {
if item[0] != '"' {
if fromQuoted {
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
return nil
}
val := "number"
switch item[0] {
case 'n':
val = "null"
case 't', 'f':
val = "bool"
}
d.saveError(&UnmarshalTypeError{Value: val, Type: v.Type(), Offset: int64(d.readIndex())})
return nil
}
s, ok := unquoteBytes(item)
if !ok {
if fromQuoted {
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
}
panic(phasePanicMsg)
}
return ut.UnmarshalText(s)
}
v = pv
switch c := item[0]; c {
case 'n': // null
// The main parser checks that only true and false can reach here,
// but if this was a quoted string input, it could be anything.
if fromQuoted && string(item) != "null" {
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
break
}
switch v.Kind() {
case reflect.Interface, reflect.Pointer, reflect.Map, reflect.Slice:
v.Set(reflect.Zero(v.Type()))
// otherwise, ignore null for primitives/string
}
case 't', 'f': // true, false
value := item[0] == 't'
// The main parser checks that only true and false can reach here,
// but if this was a quoted string input, it could be anything.
if fromQuoted && string(item) != "true" && string(item) != "false" {
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
break
}
switch v.Kind() {
default:
if fromQuoted {
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
} else {
d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
}
case reflect.Bool:
v.SetBool(value)
case reflect.Interface:
if v.NumMethod() == 0 {
v.Set(reflect.ValueOf(value))
} else {
d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
}
}
case '"': // string
s, ok := unquoteBytes(item)
if !ok {
if fromQuoted {
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
}
panic(phasePanicMsg)
}
switch v.Kind() {
default:
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
case reflect.Slice:
if v.Type().Elem().Kind() != reflect.Uint8 {
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
break
}
b := make([]byte, base64.StdEncoding.DecodedLen(len(s)))
n, err := base64.StdEncoding.Decode(b, s)
if err != nil {
d.saveError(err)
break
}
v.SetBytes(b[:n])
case reflect.String:
if v.Type() == numberType && !isValidNumber(string(s)) {
return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
}
v.SetString(string(s))
case reflect.Interface:
if v.NumMethod() == 0 {
v.Set(reflect.ValueOf(string(s)))
} else {
d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
}
}
default: // number
if c != '-' && (c < '0' || c > '9') {
if fromQuoted {
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
}
panic(phasePanicMsg)
}
s := string(item)
switch v.Kind() {
default:
if v.Kind() == reflect.String && v.Type() == numberType {
// s must be a valid number, because it's
// already been tokenized.
v.SetString(s)
break
}
if fromQuoted {
return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
}
d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
case reflect.Interface:
n, err := d.convertNumber(s)
if err != nil {
d.saveError(err)
break
}
if v.NumMethod() != 0 {
d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
break
}
v.Set(reflect.ValueOf(n))
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
n, err := strconv.ParseInt(s, 10, 64)
if err != nil || v.OverflowInt(n) {
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
break
}
v.SetInt(n)
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
n, err := strconv.ParseUint(s, 10, 64)
if err != nil || v.OverflowUint(n) {
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
break
}
v.SetUint(n)
case reflect.Float32, reflect.Float64:
n, err := strconv.ParseFloat(s, v.Type().Bits())
if err != nil || v.OverflowFloat(n) {
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
break
}
v.SetFloat(n)
}
}
return nil
}
比如整数类型,最终调用来ParseInt
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
n, err := strconv.ParseInt(s, 10, 64)
if err != nil || v.OverflowInt(n) {
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
break
}
v.SetInt(n)
至此,我们完成了简单类型的解析和绑定,最后只剩下最为复杂的对象类型。
func (d *decodeState) object(v reflect.Value) error {
// Check for unmarshaler.
u, ut, pv := indirect(v, false)
if u != nil {
start := d.readIndex()
d.skip()
return u.UnmarshalJSON(d.data[start:d.off])
}
if ut != nil {
d.saveError(&UnmarshalTypeError{Value: "object", Type: v.Type(), Offset: int64(d.off)})
d.skip()
return nil
}
v = pv
t := v.Type()
// Decoding into nil interface? Switch to non-reflect code.
if v.Kind() == reflect.Interface && v.NumMethod() == 0 {
oi := d.objectInterface()
v.Set(reflect.ValueOf(oi))
return nil
}
var fields structFields
// Check type of target:
// struct or
// map[T1]T2 where T1 is string, an integer type,
// or an encoding.TextUnmarshaler
switch v.Kind() {
case reflect.Map:
// Map key must either have string kind, have an integer kind,
// or be an encoding.TextUnmarshaler.
switch t.Key().Kind() {
case reflect.String,
reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64,
reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
default:
if !reflect.PointerTo(t.Key()).Implements(textUnmarshalerType) {
d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
d.skip()
return nil
}
}
if v.IsNil() {
v.Set(reflect.MakeMap(t))
}
case reflect.Struct:
fields = cachedTypeFields(t)
// ok
default:
d.saveError(&UnmarshalTypeError{Value: "object", Type: t, Offset: int64(d.off)})
d.skip()
return nil
}
var mapElem reflect.Value
var origErrorContext errorContext
if d.errorContext != nil {
origErrorContext = *d.errorContext
}
for {
// Read opening " of string key or closing }.
d.scanWhile(scanSkipSpace)
if d.opcode == scanEndObject {
// closing } - can only happen on first iteration.
break
}
if d.opcode != scanBeginLiteral {
panic(phasePanicMsg)
}
// Read key.
start := d.readIndex()
d.rescanLiteral()
item := d.data[start:d.readIndex()]
key, ok := unquoteBytes(item)
if !ok {
panic(phasePanicMsg)
}
// Figure out field corresponding to key.
var subv reflect.Value
destring := false // whether the value is wrapped in a string to be decoded first
if v.Kind() == reflect.Map {
elemType := t.Elem()
if !mapElem.IsValid() {
mapElem = reflect.New(elemType).Elem()
} else {
mapElem.Set(reflect.Zero(elemType))
}
subv = mapElem
} else {
var f *field
if i, ok := fields.nameIndex[string(key)]; ok {
// Found an exact name match.
f = &fields.list[i]
} else {
// Fall back to the expensive case-insensitive
// linear search.
for i := range fields.list {
ff := &fields.list[i]
if ff.equalFold(ff.nameBytes, key) {
f = ff
break
}
}
}
if f != nil {
subv = v
destring = f.quoted
for _, i := range f.index {
if subv.Kind() == reflect.Pointer {
if subv.IsNil() {
// If a struct embeds a pointer to an unexported type,
// it is not possible to set a newly allocated value
// since the field is unexported.
//
// See https://golang.org/issue/21357
if !subv.CanSet() {
d.saveError(fmt.Errorf("json: cannot set embedded pointer to unexported struct: %v", subv.Type().Elem()))
// Invalidate subv to ensure d.value(subv) skips over
// the JSON value without assigning it to subv.
subv = reflect.Value{}
destring = false
break
}
subv.Set(reflect.New(subv.Type().Elem()))
}
subv = subv.Elem()
}
subv = subv.Field(i)
}
if d.errorContext == nil {
d.errorContext = new(errorContext)
}
d.errorContext.FieldStack = append(d.errorContext.FieldStack, f.name)
d.errorContext.Struct = t
} else if d.disallowUnknownFields {
d.saveError(fmt.Errorf("json: unknown field %q", key))
}
}
// Read : before value.
if d.opcode == scanSkipSpace {
d.scanWhile(scanSkipSpace)
}
if d.opcode != scanObjectKey {
panic(phasePanicMsg)
}
d.scanWhile(scanSkipSpace)
if destring {
switch qv := d.valueQuoted().(type) {
case nil:
if err := d.literalStore(nullLiteral, subv, false); err != nil {
return err
}
case string:
if err := d.literalStore([]byte(qv), subv, true); err != nil {
return err
}
default:
d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal unquoted value into %v", subv.Type()))
}
} else {
if err := d.value(subv); err != nil {
return err
}
}
// Write value back to map;
// if using struct, subv points into struct already.
if v.Kind() == reflect.Map {
kt := t.Key()
var kv reflect.Value
switch {
case reflect.PointerTo(kt).Implements(textUnmarshalerType):
kv = reflect.New(kt)
if err := d.literalStore(item, kv, true); err != nil {
return err
}
kv = kv.Elem()
case kt.Kind() == reflect.String:
kv = reflect.ValueOf(key).Convert(kt)
default:
switch kt.Kind() {
case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
s := string(key)
n, err := strconv.ParseInt(s, 10, 64)
if err != nil || reflect.Zero(kt).OverflowInt(n) {
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: kt, Offset: int64(start + 1)})
break
}
kv = reflect.ValueOf(n).Convert(kt)
case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
s := string(key)
n, err := strconv.ParseUint(s, 10, 64)
if err != nil || reflect.Zero(kt).OverflowUint(n) {
d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: kt, Offset: int64(start + 1)})
break
}
kv = reflect.ValueOf(n).Convert(kt)
default:
panic("json: Unexpected key type") // should never occur
}
}
if kv.IsValid() {
v.SetMapIndex(kv, subv)
}
}
// Next token must be , or }.
if d.opcode == scanSkipSpace {
d.scanWhile(scanSkipSpace)
}
if d.errorContext != nil {
// Reset errorContext to its original state.
// Keep the same underlying array for FieldStack, to reuse the
// space and avoid unnecessary allocs.
d.errorContext.FieldStack = d.errorContext.FieldStack[:len(origErrorContext.FieldStack)]
d.errorContext.Struct = origErrorContext.Struct
}
if d.opcode == scanEndObject {
break
}
if d.opcode != scanObjectValue {
panic(phasePanicMsg)
}
}
return nil
}
它同样首先检查有没有自定义反序列化方法,如果没有,则采用内置的反序列化方法。然后检查需要绑定的值的类型是不是interface类型,如果是,就调用objectInterface来进行反序列化。否则检查需要绑定的值的类型是不是map类型,如果是map类型,需要检查key分类型是否能作为map的key或者是否实现textUnmarshalerType方法。最后才检查需要绑定的对象是不是结构体类型。如果是结构体类型,调用cachedTypeFields先通过反射获取结构体的每个字段和这个字段对应的反序列化方法,并缓存下来。
func cachedTypeFields(t reflect.Type) structFields {
if f, ok := fieldCache.Load(t); ok {
return f.(structFields)
}
f, _ := fieldCache.LoadOrStore(t, typeFields(t))
return f.(structFields)
}
同样这里也用到了sync.Map来做缓存
var fieldCache sync.Map
func typeFields(t reflect.Type) structFields {
next := []field{{typ: t}}
for len(next) > 0 {
for _, f := range current {
if visited[f.typ] {
continue
}
for i := 0; i < f.typ.NumField(); i++ {
sf := f.typ.Field(i)
tag := sf.Tag.Get("json")
name, opts := parseTag(tag)
index[len(f.index)] = i
if opts.Contains("string") {
// Record found field and index sequence.
if name != "" || !sf.Anonymous || ft.Kind() != reflect.Struct {
field := field{
name: name,
tag: tagged,
index: index,
typ: ft,
omitEmpty: opts.Contains("omitempty"),
quoted: quoted,
}
HTMLEscape(&nameEscBuf, field.nameBytes)
fields = append(fields, field)
sort.Slice(fields, func(i, j int) bool {
x := fields
if x[i].name != x[j].name {
return x[i].name < x[j].name
}
for advance, i := 0, 0; i < len(fields); i += advance {
for advance = 1; i+advance < len(fields); advance++ {
fj := fields[i+advance]
sort.Sort(byIndex(fields))
return structFields{fields, nameIndex}
会根据结构体的定义和tag标记,解析每个field的描述信息,存到slice里面,并且会对每个field的name进行排序。
// A field represents a single field found in a struct.
type field struct {
name string
nameBytes []byte // []byte(name)
equalFold func(s, t []byte) bool // bytes.EqualFold or equivalent
nameNonEsc string // `"` + name + `":`
nameEscHTML string // `"` + HTMLEscape(name) + `":`
tag bool
index []int
typ reflect.Type
omitEmpty bool
quoted bool
encoder encoderFunc
}
做完上述准备工作后,就进入了for循环里来进行json对象的每个字段的解析,得到一个个item,然后根据值的类型进行绑定:如果是map,就存到map的value,如果是结构体,就存到对应field
for {
d.scanWhile(scanSkipSpace)
item := d.data[start:d.readIndex()]
key, ok := unquoteBytes(item)
if v.Kind() == reflect.Map {
elemType := t.Elem()
if !mapElem.IsValid() {
mapElem = reflect.New(elemType).Elem()
} else {
mapElem.Set(reflect.Zero(elemType))
}
subv = mapElem
定位应该设置到哪个field,是通过field的名字来进行匹配的:
} else {
var f *field
if i, ok := fields.nameIndex[string(key)]; ok {
// Found an exact name match.
f = &fields.list[i]
for i := range fields.list {
// Fall back to the expensive case-insensitive
// linear search.
ff := &fields.list[i]
if ff.equalFold(ff.nameBytes, key) {
f = ff
break
}
}
如果没有找到field名字,那么就执行大小写不敏感规则,这个匹配过程是一个线性扫描过程,时间复杂度是O(n),配对上以后,就通过反射来进行赋值:
if f != nil {
subv = v
destring = f.quoted
for _, i := range f.index {
if subv.Kind() == reflect.Pointer {
if subv.IsNil() {
subv.Set(reflect.New(subv.Type().Elem()))
if destring {
switch qv := d.valueQuoted().(type) {
case nil:
if err := d.literalStore(nullLiteral, subv, false); err != nil {
return err
}
case string:
if err := d.literalStore([]byte(qv), subv, true); err != nil {
return err
}
default:
以上就是反序列化的核心代码。除此之外,json解析在go里面限制最大深度 10000。json RawMessag是原始编码后的json值。含json.RawMessage字段的结构,在反序列化时会完整的接收json串对象的[]byte形式。延迟解析在实际使用时才解析它的具体类型。使用json.RawMessage方式,将json的字串继续以byte数组方式存在。
我们可以看到,针对简单类型和数组类型,我们可以依次从前往后解析json string绑定到我们的go对象。但是对于json的object类型,处理起来就比较棘手,首先,json object是无序的,如果不做优化,它和go struct类型匹配的过程是O(n^2)的复杂度。如果仅仅是解析到map类型或者interface类型,因为没有匹配过程,性能还好。在匹配struct类型的时候,golang也进行了优化,通过反射,建立类型和对应反序列化方法的影射关系,并且根据field的名字进行了排序,将复杂度降低到O(nlogn),但是,如果json的object的key和struct的field的名字不能完全匹配,退化到首字母不敏感匹配时,算法又会退化到O(n^2)的复杂度。在明确知道类型的时候,上述运行时的方法可以提前到编译时。另外在反序列化的时候先检查json是否合法,进行了一次json串的遍历,然后在值绑定的时候又进行了一次遍历。虽然提前一次遍历能够减少json不合法场景下的内存分配和反射操作,但是两次json遍历确实有很大浪费。因为在实际生产中多数json都是合法的,前面的一次检查可以优化掉。