Module: Chainer

Defined in:
lib/chainer/function_node.rb,
lib/chainer.rb,
lib/chainer/cuda.rb,
lib/chainer/link.rb,
lib/chainer/device.rb,
lib/chainer/backend.rb,
lib/chainer/version.rb,
lib/chainer/function.rb,
lib/chainer/reporter.rb,
lib/chainer/variable.rb,
lib/chainer/optimizer.rb,
lib/chainer/parameter.rb,
lib/chainer/serializer.rb,
lib/chainer/utils/conv.rb,
lib/chainer/utils/math.rb,
lib/chainer/initializer.rb,
lib/chainer/utils/array.rb,
lib/chainer/configuration.rb,
lib/chainer/testing/array.rb,
lib/chainer/training/util.rb,
lib/chainer/variable_node.rb,
lib/chainer/datasets/cifar.rb,
lib/chainer/datasets/mnist.rb,
lib/chainer/gradient_check.rb,
lib/chainer/hyperparameter.rb,
lib/chainer/utils/variable.rb,
lib/chainer/dataset/convert.rb,
lib/chainer/gradient_method.rb,
lib/chainer/optimizers/adam.rb,
lib/chainer/dataset/iterator.rb,
lib/chainer/training/trainer.rb,
lib/chainer/training/updater.rb,
lib/chainer/initializers/init.rb,
lib/chainer/utils/initializer.rb,
lib/chainer/functions/math/exp.rb,
lib/chainer/functions/math/sum.rb,
lib/chainer/training/extension.rb,
lib/chainer/initializers/normal.rb,
lib/chainer/serializers/marshal.rb,
lib/chainer/functions/array/cast.rb,
lib/chainer/initializers/uniform.rb,
lib/chainer/initializers/constant.rb,
lib/chainer/datasets/tuple_dataset.rb,
lib/chainer/links/model/classifier.rb,
lib/chainer/functions/array/reshape.rb,
lib/chainer/functions/array/squeeze.rb,
lib/chainer/functions/math/identity.rb,
lib/chainer/functions/noise/dropout.rb,
lib/chainer/links/connection/linear.rb,
lib/chainer/optimizers/momentum_sgd.rb,
lib/chainer/functions/array/rollaxis.rb,
lib/chainer/functions/activation/relu.rb,
lib/chainer/functions/activation/tanh.rb,
lib/chainer/functions/array/transpose.rb,
lib/chainer/functions/math/basic_math.rb,
lib/chainer/iterators/serial_iterator.rb,
lib/chainer/links/connection/embed_id.rb,
lib/chainer/training/standard_updater.rb,
lib/chainer/training/triggers/interval.rb,
lib/chainer/functions/array/select_item.rb,
lib/chainer/functions/connection/linear.rb,
lib/chainer/functions/activation/sigmoid.rb,
lib/chainer/functions/array/broadcast_to.rb,
lib/chainer/functions/pooling/pooling_2d.rb,
lib/chainer/training/extensions/snapshot.rb,
lib/chainer/functions/connection/embed_id.rb,
lib/chainer/functions/evaluation/accuracy.rb,
lib/chainer/training/extensions/evaluator.rb,
lib/chainer/training/extensions/log_report.rb,
lib/chainer/functions/activation/leaky_relu.rb,
lib/chainer/functions/activation/relu_grad2.rb,
lib/chainer/links/connection/convolution_2d.rb,
lib/chainer/functions/activation/log_softmax.rb,
lib/chainer/functions/pooling/max_pooling_2d.rb,
lib/chainer/training/extensions/print_report.rb,
lib/chainer/training/extensions/progress_bar.rb,
lib/chainer/functions/activation/sigmoid_grad.rb,
lib/chainer/functions/loss/mean_squared_error.rb,
lib/chainer/functions/connection/convolution_2d.rb,
lib/chainer/functions/loss/softmax_cross_entropy.rb,
lib/chainer/functions/pooling/average_pooling_2d.rb,
lib/chainer/functions/connection/deconvolution_2d.rb,
lib/chainer/training/extensions/exponential_shift.rb,
lib/chainer/links/normalization/batch_normalization.rb,
lib/chainer/functions/connection/convolution_2d_grad_w.rb,
lib/chainer/functions/normalization/batch_normalization.rb

Overview

Function node of the computational graph. FunctionNode is a class representing a node in a computational graph. The node corresponds to an application of a differentiable function to input variables. When a differentiable function is applied to ‘Chainer::Variable` objects, it creates an instance of FunctionNode implementation and calls its `apply` method. The `apply` method basically does the following three things.

1. Adding an edge from the function node to the variable node corresponding to each input.
   The node of each input is extracted by `Chainer::`Variable.node`.
2. Computing the output arrays of the function.
3. Creating a :class:`Variable` object for each output array and
   adding an edge from the node of the variable to the function node.

The output variables are then returned.

Defined Under Namespace

Modules: CUDA, Dataset, Datasets, Device, Functions, Initializers, Iterators, Links, Optimizers, ReportService, Serializers, Testing, Training, Utils Classes: AbstractDevice, AbstractSerializer, Chain, ChainList, Configuration, CpuDevice, Deserializer, DictSummary, Function, FunctionAdapter, FunctionNode, GpuDevice, GradientMethod, Hyperparameter, HyperparameterProxy, Initializer, Link, Optimizer, Parameter, Reporter, Serializer, Summary, UpdateRule, Variable, VariableNode, WeightDecay

Constant Summary collapse

VERSION =
"0.4.1"

Class Method Summary collapse

Class Method Details

._as_tuple(x) ⇒ Object



53
54
55
56
57
58
59
# File 'lib/chainer/gradient_check.rb', line 53

def _as_tuple(x)
  if x.is_a? Array
    return x
  else
    return [x]
  end
end

._copy_arrays(xs) ⇒ Object



2
3
4
# File 'lib/chainer/gradient_check.rb', line 2

def _copy_arrays(xs)
  xs.map{|x| Chainer.array?(x) ? x.dup : x}
end

.array?(obj) ⇒ Boolean

Returns true if the argument is either of Numo::NArray or Cumo::NArray.

Parameters:

  • obj (Object)

Returns:

  • (Boolean)


19
20
21
22
23
24
25
# File 'lib/chainer/backend.rb', line 19

def array?(obj)
  if CUDA.available?
    return true if obj.kind_of?(Cumo::NArray)
  end
  return true if obj.kind_of?(Numo::NArray)
  false
end

.check_backward(func, x_data, y_grad, params = [], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil) ⇒ Object

Note:

func is called many times to get numerical gradients for all inputs. This function doesn’t work correctly when func behaves randomly as it gets different gradients.

Test backward procedure of a given function.

This function automatically check backward-process of given function. For example, when you have a Chainer::Function class MyFunc, that gets two arguments and returns one value, you can make its test like this:

def test_my_func(self):
  func = MyFunc()
  x1_data = Numo::NArray[...]
  x2_data = Numo::NArray[...]
  gy_data = Numo::NArray[...]
  check_backward(func, [x1_data, x2_data], gy_data)

This method creates Chainer::Variable objects with x_data and calls func with the Chainer::Variable s to get its result as Chainer::Variable. Then, it sets y_grad array to grad attribute of the result and calls backward method to get gradients of the inputs. To check correctness of the gradients, the function calls numerical_grad to calculate numerically the gradients and compares the types of gradients with Chainer::Testing.assert_allclose. If input objects (x1_data or/and x2_data in this example) represent integer variables, their gradients are ignored.

You can simplify a test when MyFunc gets only one argument:

check_backward(func, x1_data, gy_data)

If MyFunc is a loss function which returns a zero-dimensional array, pass nil to gy_data. In this case, it sets 1 to grad attribute of the result:

check_backward(my_loss_func, [x1_data, x2_data], nil)

If MyFunc returns multiple outputs, pass all gradients for outputs as a Array:

gy1_data = Numo::NArray[...]
gy2_data = Numo::NArray[...]
check_backward(func, x1_data, [gy1_data, gy2_data])

You can also test a Chainer::Link. To check gradients of parameters of the link, set a Array of the parameters to params arguments:

check_backward(my_link, [x1_data, x2_data], gy_data, [my_link.W, my_link.b])

Note that params are not Numo::NArray s, but Chainer::Variables s.

Function objects are acceptable as func argument:

check_backward(lambda{|x1, x1| f(x1, x2)}, [x1_data, x2_data], gy_data)

Parameters:

  • func (Method, Proc)

    A function which gets Chainer::Variable s and returns Chainer::Variable s. func must returns a Array of Chainer::Variable s or one Chainer::Variable. You can use Chainer::Function object, Chainer::Link object or a function satisfying the condition.

  • x_data (Numo::NArray or Array<Numo::NArray>)

    A set of Numo::NArray s to be passed to func. If x_data is one Numo::NArray object, it is treated as (x_data,).

  • y_grad (Numo::NArray or Array<Numo::NArray> or nil)

    A set of Numo::NArray s representing gradients of return-values of func. If y_grad is one Numo::NArray object, it is treated as (y_grad,). If func is a loss-function, y_grad should be set to nil.

  • params (Chainer::Variable or Array<Chainder::Variable>) (defaults to: [])

    A set of Chainer::Variable s whose gradients are checked. When func is a Chainer::Link object, set its parameters as params. If params is one Chainer::Variable object, it is treated as (params,).

  • eps (Float) (defaults to: 0.001)

    Epsilon value to be passed to numerical_grad.

  • atol (Float) (defaults to: 1e-5)

    Absolute tolerance to be passed to Chainer::Testing.assert_allclose.

  • rtol (Float) (defaults to: 1e-4)

    Relative tolerance to be passed to Chainer::Testing.assert_allclose.

  • no_grads (Array<Boolean>) (defaults to: nil)

    Flag to skip variable for gradient assertion. It should be same length as x_data.

  • dtype (Numo::NArray.class) (defaults to: nil)

    x_data and y_grad are casted to this dtype when calculating numerical gradients. Only float types and nil are allowed.

See Also:



147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
# File 'lib/chainer/gradient_check.rb', line 147

def check_backward(func, x_data, y_grad, params=[], eps: 0.001, atol: 1e-5, rtol: 1e-4, no_grads: nil, dtype: nil)
  x_data = _as_tuple(x_data)
  xm = Chainer.get_array_module(*x_data)
  if !y_grad.nil?
    y_grad = _as_tuple(y_grad)
  end

  params = _as_tuple(params)
  xs = x_data.map{|x| Chainer::Variable.new(x)}
  y = func.(*xs)
  y = _as_tuple(y)
  y = Chainer::Functions::Math::Identity.new.apply(y)

  y_grad = set_y_grad(y, y_grad)

  # Clear gradients which may exist if func calls backward inside of itself.
  clear_grads(xs)
  clear_grads(params)

  # We only need to call `backward` for one result `Chainer::Variable`.
  # `Chainer::Variable.backward` method calls `Chainer::Function.backward` of its creator.
  y[0].backward()

  param_data = params.map { |p| p.data }
  if dtype.nil?
    casted_xs = x_data.map { |x| Chainer::Variable.new(x) }
  else
    raise '`dtype` is allowed only float type' if dtype != xm::DFloat && dtype != xm::SFloat
    casted_xs = x_data.map { |x| x.is_a?(Numo::NArray) ? Chainer::Variable.new(x.cast_to(dtype)) : x  }
  end

  if no_grads.nil?
    no_grads = xs.map { |x| x.dtype != Numo::SFloat && x.dtype != Numo::DFloat }
  else
    raise "Length of no_grads param and xs should be same." if no_grads.size != xs.size
  end

  casted_data = casted_xs.map { |x| x.data.dup }

  no_grads.zip(xs).each do |skip, x|
    if skip
      raise "x.grad is not nil" if  x.grad != nil
    else
      raise 'gradients of some arguments are not calculated' if x.grad.nil?
    end
  end

  # Keep the gradient arrays of params which may be overwritten by func
  params_grad = params.map(&:grad)

  if dtype.nil?
    one = Numo::DFloat.new().fill(1.0)
  else
    one = dtype.new().fill(1.0)
  end

  g = lambda do
    # This functions is called twice in `numerical_grad`.
    # `one` is `1 + epsilon` or `1 - epsilon` in these calls.
    # See the document of `numerical_grad`.
    no_grads.zip(casted_xs, casted_data).each do |skip, cx, data|
      next if skip || cx.data.empty?
      # astype is require to store data with the given type
      data = (one * data).cast_to(data.class)
      cx.data = data
    end

    params.zip(param_data).each do |param, data|
      if !dtype.nil?
        param_dtype = dtype
      else
        param_dtype = param.dtype
      end
      # The inner astype is required to calculates __mul__ in
      # `param_type` when data is low accuracy float.
      # The outer one is require to store data with the given type.
      param.data = (one * data.cast_to(param_dtype)).cast_to(param_dtype)
    end

    # Clear gradients to support func that calls backward inside of itself.
    clear_grads(casted_xs)
    clear_grads(params)

    ys = func.(*casted_xs)
    ys = _as_tuple(ys)
    ys_data = ys.map { |y| y.data }
    no_grads.zip(casted_xs, casted_data).each do |skip, cx, data|
      next if skip
      cx.data = data
    end
    params.zip(param_data).each do |param, data|
      param.data = data
    end
    ys_data
  end

  gx, = numerical_grad(g, [one], y_grad, eps)
  gx_accum = 0

  no_grads.zip(xs, casted_xs).each do |skip, x, cx|
    next if skip
    gxi = x.grad.flatten.dup
    cxi = cx.data.flatten.dup
    unless dtype.nil?
      gxi = gxi.cast_to(dtype)
      cxi = cxi.cast_to(dtype)
    end
    gx_accum += gxi.empty? ? 0 : gxi.dot(cxi)
  end

  params.zip(params_grad).each do |p, gpi|
    gpi =gpi.flatten.dup
    pi = p.data.flatten.dup
    unless dtype.nil?
      gpi = gpi.cast_to(dtype)
      pi = pi.cast_to(dtype)
    end
    gx_accum += gpi.dot(pi)
  end

  Chainer::Testing.assert_allclose(gx, gx_accum, atol: atol, rtol: rtol)
end

.check_double_backward(func, x_data, y_grad, x_grad_grad, params = [], params_grad_grad = [], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil) ⇒ Object



270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
# File 'lib/chainer/gradient_check.rb', line 270

def check_double_backward(func, x_data, y_grad, x_grad_grad, params=[], params_grad_grad=[], eps: 1e-3, atol: 1e-4, rtol: 1e-3, no_grads: nil, dtype: nil)
  x_data = _as_tuple(x_data)
  params = _as_tuple(params)
  n_x = x_data.size

  first_order_grad = -> *inputs do
    xs = inputs[0...n_x]
    gys = inputs[n_x..-1]

    y = _as_tuple(func.(*xs))
    # Let all elements of y share the same creator.
    # See the comment in check_backward.
    y = Chainer::Functions::Math::Identity.new.apply(y)
    set_y_grad(y, gys)
    y[0].backward(enable_double_backprop: true)

    xs.map(&:grad_var) + params.map(&:grad_var)
  end

  inputs = x_data + _as_tuple(y_grad)
  grad_grad = _as_tuple(x_grad_grad) + _as_tuple(params_grad_grad)
  check_backward(first_order_grad, inputs, grad_grad, params=params, eps: eps, atol: atol, rtol: rtol, no_grads: no_grads, dtype: dtype)
end

.configurationObject



97
98
99
# File 'lib/chainer.rb', line 97

def self.configuration
  @configuration ||= Configuration.new
end

.configure {|configuration| ... } ⇒ Object

Yields:



93
94
95
# File 'lib/chainer.rb', line 93

def self.configure
  yield(configuration)
end

.get_array_module(*args) ⇒ Class

Gets an appropriate one from Numo::NArray or Cumo::NArray from given arrays.

Parameters:

  • args (Array<Chainer::Variable> or Array<Numo::NArray> or Array<Cumo::NArray>)

    Values to determine whether Numo or Cumo should be used.

Returns:

  • (Class)

    Cumo::NArray or Numo::NArray is returned based on the types of the arguments.



6
7
8
9
10
11
12
# File 'lib/chainer/backend.rb', line 6

def get_array_module(*args)
  arrays = args.map {|v| v.kind_of?(Chainer::Variable) ? v.data : v }
  if CUDA.available?
    return Cumo if arrays.any? {|a| a.kind_of?(Cumo::NArray) }
  end
  return Numo
end

.grad(outputs, inputs, grad_outputs: nil, grad_inputs: nil, set_grad: false, retain_grad: false, enable_double_backprop: false) ⇒ Object



248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
# File 'lib/chainer/function_node.rb', line 248

def self.grad(outputs, inputs, grad_outputs: nil, grad_inputs: nil, set_grad: false, retain_grad: false, enable_double_backprop: false)
  # The implementation consists of three steps.

  if !outputs.is_a?(Array)
    raise TypeError, "outputs must be Array, not #{outputs.class}"
  end
  if !inputs.is_a?(Array)
    raise TypeError, "inputs must be Array, not #{inputs.class}"
  end
  if !grad_outputs.nil? && !grad_outputs.is_a?(Array)
    raise TypeError, "grad_outputs must be Array, not #{grad_outputs.class}"
  end
  if !grad_inputs.nil? && !grad_inputs.is_a?(Array)
    raise TypeError, "grad_inputs must be Array, not #{grad_inputs.class}"
  end

  # 1. Backward enumeration: all the nodes reachable backward from the output
  #    nodes are enumerated. The forward direction links are collected in
  #    this step. Note that the variable nodes whose requires_grad is false
  #    are ignored and their creators are not searched.
  candidate_funcs = outputs.map(&:creator_node).compact
  visited_funcs = Set.new
  forward_graph = {}

  while func = candidate_funcs.pop
    next if visited_funcs.include?(func)
    visited_funcs.add(func)

    func.inputs.each do |x|
      next unless x.requires_grad
      forward_graph[x] = [] if forward_graph[x].nil?
      forward_graph[x] << func
      creator = x.creator_node
      if creator && !visited_funcs.include?(creator)
        candidate_funcs << creator
      end
    end
  end

  # 2. Forward enumeration: all the nodes in the subgraph reachable from the
  #    input nodes are enumerated. The extracted (sub-)subgraph is the union
  #    of all paths that backpropagation will visit.
  candidate_vars = inputs.map(&:node)
  visited_funcs = Set.new
  grad_required = Set.new
  while x = candidate_vars.pop
    grad_required.add(x)
    forward_graph[x].each do |func|
      next if visited_funcs.include?(func)
      visited_funcs.add(func)
      func.outputs.each do |y_ref|
        y = y_ref.__getobj__
        if y && forward_graph[y]
          candidate_vars << y
        end
      end
    end
  end

  # 3. Backpropagation: the backpropagation is executed along the
  #    (sub-)subgraph. It uses the topological order of the subgraph which is
  #    induced by the reversed order of function applications ("rank").
  grads = {}  # mapping from variable nodes to their gradients

  # Initialize the gradient mapping.
  grad_outputs = [nil] * outputs.size if grad_outputs.nil?
  outputs.zip(grad_outputs).each do |y, gy|
    if gy.nil?
      gy_data = y.data.new_ones
      gy = Chainer::Variable.new(gy_data, requires_grad: false)
    end

    grads[y.node] = gy
  end

  unless grad_inputs.nil?
    inputs.zip(grad_inputs).each do |x, gx|
      grads[x.node] = gx unless gx.nil?
    end
  end

  # Backprop implementation. It edits grads which will only contain the
  # gradients w.r.t. the inputs.
  old_enable_backprop = Chainer.configuration.enable_backprop
  Chainer.configuration.enable_backprop = enable_double_backprop
  backprop(outputs, inputs, grad_required, retain_grad, grads)
  Chainer.configuration.enable_backprop = old_enable_backprop

  # Extract the gradients w.r.t. the inputs and return them.
  ret = inputs.map { |x| grads[x.node] }
  if set_grad
    inputs.zip(ret).each do |x, gx|
      x.grad_var = gx
    end
  end

  ret
end

.numerical_grad(f, inputs, grad_outputs, eps = 1e-3) ⇒ Array

Computes numerical gradient by finite differences.

This function is used to implement gradient check. For usage example, see unit tests of Chainer::Functions.

Parameters:

  • f (function)

    Ruby function with no arguments that runs forward computation and returns the result.

  • inputs (Array<Arrays>)

    Array of arrays that should be treated as inputs. Each element of them is slightly modified to realize numerical gradient by finite differences.

  • grad_outputs (Array<Arrays>)

    Array of arrays that are treated as output gradients.

  • eps (Float) (defaults to: 1e-3)

    Epsilon value of finite differences.

Returns:

  • (Array)

    Numerical gradient arrays corresponding to inputs.



21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# File 'lib/chainer/gradient_check.rb', line 21

def numerical_grad(f, inputs, grad_outputs, eps=1e-3)
  raise unless eps > 0
  inputs = inputs.to_a
  grad_outputs = grad_outputs.to_a
  grads = inputs.map{|x| x.new_zeros()}

  inputs.zip(grads).each do |x, gx|
    orig_x = x.dup # hold original value
    x.each_with_index{|_, *i|
      orig = orig_x[*i]
      x[*i] = orig + eps
      ys1 = _copy_arrays(f.())
      x[*i] = orig - eps
      ys2 = _copy_arrays(f.())
      x[*i] = orig

      ys1.zip(ys2, grad_outputs).each do |y1, y2, gy|
        next if gy.nil?
        diff = y1 - y2
        if Chainer.array?(diff) && diff.empty?
          dot = 0
        else
          dot = (diff * gy).sum
        end
        gx[*i] += dot / (2 * eps)
      end
    }
  end

  return grads
end