Class: Datadog::DI::Serializer Private
- Inherits:
-
Object
- Object
- Datadog::DI::Serializer
- Defined in:
- lib/datadog/di/serializer.rb
Overview
This class is part of a private API. You should avoid using this class if possible, as it may be removed or be changed in the future.
Serializes captured snapshot to primitive types, which are subsequently serialized to JSON and sent to the backend.
This class performs actual removal of sensitive values from the snapshots. It uses Redactor to determine which values are sensitive and need to be removed.
Serializer normally ought not to invoke user (application) code, to guarantee predictable performance. However, objects like ActiveRecord models cannot be usefully serialized into primitive types without custom logic (for example, the attributes are more than 3 levels down from the top-level object which is the default capture depth, thus they won’t be captured at all). To accommodate complex objects, there is an extension mechanism implemented permitting registration of serializer callbacks for arbitrary types. Applications and libraries definining such serializer callbacks should be very careful to have predictable performance and avoid exceptions and infinite loops and other such issues.
All serialization methods take the names of the variables being serialized in order to be able to redact values.
The result of serialization should not reference parameter values when the values are mutable (currently, this only applies to string values). Serializer will duplicate such mutable values, so that if method arguments are captured at entry and then modified during method execution, the serialized values from entry are correctly preserved. Alternatively, we could pass a parameter to the serialization methods which would control whether values are duplicated. This may be more efficient but there would be additional overhead from passing this parameter all the time and the API would get more complex.
Note: “self” cannot be used as a parameter name in Ruby, therefore there should never be a conflict between instance variable serialization and method parameters.
Constant Summary collapse
- FATAL_EXCEPTION_CLASSES =
This constant is part of a private API. You should avoid using this constant if possible, as it may be removed or be changed in the future.
Exception classes that should never be caught during serialization. These represent fatal conditions (signals, interrupts, system exit) that must propagate to the caller.
[SignalException, Interrupt, SystemExit].freeze
- @@flat_registry =
This classvariable is part of a private API. You should avoid using this classvariable if possible, as it may be removed or be changed in the future.
Third-party library integration / custom serializers.
Dynamic instrumentation has limited payload sizes, and for efficiency reasons it is not desirable to transmit data to Datadog that will never contain useful information. Additionally, due to depth limits, desired data may not even be included in payloads when serialized with the default, naive serializer. Therefore, custom objects like ActiveRecord model instances may need custom serializers.
CUSTOMER NOTE: The API for defining custom serializers is not yet finalized. Please create an issue at github.com/datadog/dd-trace-rb/issues describing the object(s) you wish to serialize so that we can ensure your use case will be supported as the library evolves.
Note that the current implementation does not permit defining a serializer for a particular class, which is the simplest use case. This is because the library itself does not need this functionality yet, and it won’t help for ActiveRecord models (that derive from a common base class but are all of different classes) or for Mongoid models (that do not have a common base class at all but include a standard Mongoid module).
Important: these serializers are NOT used in log messages. They are only used for variables that are captured in the snapshots.
Exception handling: If a custom serializer’s condition lambda raises an exception (e.g., regex match against invalid UTF-8 strings), the exception will be logged at WARN level, then the serializer will be skipped and the next serializer will be tried. This prevents custom serializers from breaking the entire serialization process.
IMPORTANT: Custom serializers MUST produce data that can be JSON-encoded. Specifically, custom serializers MUST NOT produce strings with binary encoding (ASCII-8BIT) containing non-ASCII code points (bytes >= 0x80) that cannot be automatically transcoded to UTF-8. Such strings will cause JSON encoding to fail, which will result in the probe being disabled and an ERROR status being reported. If your data contains binary content, encode it to a text representation (e.g., Base64, hex string, or UTF-8 with replacement characters) before returning it from the custom serializer.
[]
Instance Attribute Summary collapse
- #redactor ⇒ Object readonly private
- #settings ⇒ Object readonly private
- #telemetry ⇒ Object readonly private
Class Method Summary collapse
Instance Method Summary collapse
- #combine_args(args, kwargs, target_self) ⇒ Object private
-
#initialize(settings, redactor, telemetry: nil) ⇒ Serializer
constructor
private
A new instance of Serializer.
-
#serialize_args(args, kwargs, target_self, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: settings.dynamic_instrumentation.max_capture_attribute_count) ⇒ Object
private
Serializes positional and keyword arguments to a method, as obtained by a method probe.
-
#serialize_value(value, name: nil, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: nil, type: nil) ⇒ Object
private
Serializes a single named value.
-
#serialize_value_for_message(value, depth = 1) ⇒ Object
private
This method is used for serializing arbitrary values into log messages.
-
#serialize_vars(vars, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: settings.dynamic_instrumentation.max_capture_attribute_count) ⇒ Object
private
Serializes variables captured by a line probe.
Constructor Details
#initialize(settings, redactor, telemetry: nil) ⇒ Serializer
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Returns a new instance of Serializer.
96 97 98 99 100 |
# File 'lib/datadog/di/serializer.rb', line 96 def initialize(settings, redactor, telemetry: nil) @settings = settings @redactor = redactor @telemetry = telemetry end |
Instance Attribute Details
#redactor ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
103 104 105 |
# File 'lib/datadog/di/serializer.rb', line 103 def redactor @redactor end |
#settings ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
102 103 104 |
# File 'lib/datadog/di/serializer.rb', line 102 def settings @settings end |
#telemetry ⇒ Object (readonly)
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
104 105 106 |
# File 'lib/datadog/di/serializer.rb', line 104 def telemetry @telemetry end |
Class Method Details
.register(condition: nil, &block) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
92 93 94 |
# File 'lib/datadog/di/serializer.rb', line 92 def self.register(condition: nil, &block) @@flat_registry << {condition: condition, proc: block} end |
Instance Method Details
#combine_args(args, kwargs, target_self) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/datadog/di/serializer.rb', line 106 def combine_args(args, kwargs, target_self) counter = 0 combined = args.each_with_object({}) do |value, c| counter += 1 # Conversion to symbol is needed here to put args ahead of # kwargs when they are merged below. c[:"arg#{counter}"] = value end.update(kwargs) combined[:self] = target_self combined end |
#serialize_args(args, kwargs, target_self, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: settings.dynamic_instrumentation.max_capture_attribute_count) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Serializes positional and keyword arguments to a method, as obtained by a method probe.
UI supports a single argument list only and does not distinguish between positional and keyword arguments. We convert positional arguments to keyword arguments (“arg1”, “arg2”, …) and ensure the positional arguments are listed first.
Instance variables are technically a hash just like kwargs, we take them as a separate parameter to avoid a hash merge in upstream code.
129 130 131 132 133 134 |
# File 'lib/datadog/di/serializer.rb', line 129 def serialize_args(args, kwargs, target_self, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: settings.dynamic_instrumentation.max_capture_attribute_count) combined = combine_args(args, kwargs, target_self) serialize_vars(combined, depth: depth, attribute_count: attribute_count) end |
#serialize_value(value, name: nil, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: nil, type: nil) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Serializes a single named value.
The name is needed to perform sensitive data redaction.
In some cases, the value being serialized does not have a name (for example, it is the return value of a method). In this case name can be nil.
Returns a data structure comprised of only values of basic types (integers, strings, arrays, hashes).
Respects string length, collection size and traversal depth limits.
160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 |
# File 'lib/datadog/di/serializer.rb', line 160 def serialize_value(value, name: nil, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: nil, type: nil) attribute_count ||= settings.dynamic_instrumentation.max_capture_attribute_count cls = type || value.class begin if redactor.redact_type?(value) return {type: class_name(cls), notCapturedReason: "redactedType"} end if name && redactor.redact_identifier?(name) return {type: class_name(cls), notCapturedReason: "redactedIdent"} end @@flat_registry.each do |entry| condition = entry[:condition] if condition begin condition_result = condition.call(value) rescue => e # If a custom serializer condition raises an exception (e.g., regex match # against invalid UTF-8), skip it and continue with the next serializer. # We don't want custom serializer conditions to break the entire serialization. # # Custom serializers may be defined by customers (in which case we should # surface errors so they can fix their serializers) or they may be defined # internally by dd-trace-rb (in which case we need to fix them). We use # WARN level to surface these errors in either case. Datadog.logger.warn("DI: Custom serializer condition failed: #{e.class}: #{e.message}") telemetry&.report(e, description: "Custom serializer condition failed") next end if condition_result serializer_proc = entry.fetch(:proc) return serializer_proc.call(self, value, name: nil, depth: depth) end end end serialized = {type: class_name(cls)} # https://github.com/soutaro/steep/issues/1860 # @type var serialized: untyped case value when NilClass serialized.update(isNull: true) when Integer, Float, TrueClass, FalseClass serialized.update(value: value.to_s) when Time # This path also handles DateTime values although they do not need # to be explicitly added to the case statement. serialized.update(value: value.iso8601) when Date serialized.update(value: value.to_s) when String, Symbol need_dup = false value = if String === value # This is the only place where we duplicate the value, currently. # All other values are immutable primitives (e.g. numbers). # However, do not duplicate if the string is frozen, or if # it is later truncated. need_dup = !value.frozen? value else value.to_s end # Handle binary strings and invalid UTF-8 by escaping to JSON-safe format. # See escape_binary_string for details on the escaping format. # # Truncate binary data BEFORE escaping to avoid cutting mid-escape-sequence. # For regular strings, the limit is applied to string length in characters. max = settings.dynamic_instrumentation.max_capture_string_length if value.encoding == Encoding::BINARY || !value.valid_encoding? # Truncate binary data BEFORE escaping to avoid cutting mid-escape-sequence # For invalid encodings, use bytesize instead of length to avoid encoding errors original_size = value.bytesize if original_size > max serialized.update(truncated: true, size: original_size) value = value.byteslice(0...max) end value = escape_binary_string(value) # steep:ignore ArgumentTypeMismatch false # Already converted to a new string else # Truncate non-binary strings if value.length > max serialized.update(truncated: true, size: value.length) value = value[0...max] need_dup = false end value = value.dup if need_dup end serialized.update(value: value) when Array if depth < 0 serialized.update(notCapturedReason: "depth") else max = settings.dynamic_instrumentation.max_capture_collection_size if max != 0 && value.length > max serialized.update(notCapturedReason: "collectionSize", size: value.length) # same steep failure with array slices. # https://github.com/soutaro/steep/issues/1219 value = value[0...max] || [] end entries = value.map do |elt| serialize_value(elt, depth: depth - 1) end serialized.update(elements: entries) end when Hash if depth < 0 serialized.update(notCapturedReason: "depth") else max = settings.dynamic_instrumentation.max_capture_collection_size cur = 0 entries = [] value.each do |k, v| if max != 0 && cur >= max serialized.update(notCapturedReason: "collectionSize", size: value.length) break end cur += 1 entries << [serialize_value(k, depth: depth - 1), serialize_value(v, name: k, depth: depth - 1)] end serialized.update(entries: entries) end else if depth < 0 serialized.update(notCapturedReason: "depth") else fields = {} cur = 0 # MRI and JRuby 9.4.5+ preserve instance variable definition # order when calling #instance_variables. Previous JRuby versions # did not preserve order and returned the variables in arbitrary # order. # # The arbitrary order is problematic because 1) when there are # fewer instance variables than capture limit, the order in which # the variables are shown in UI will change from one capture to # the next and generally will be arbitrary to the user, and # 2) when there are more instance variables than capture limit, # *which* variables are captured will also change meaning user # looking at the UI may have "new" instance variables appear and # existing ones disappear as they are looking at multiple captures. # # For consistency, we should have some kind of stable order of # instance variables on all supported Ruby runtimes, so that the UI # stays consistent. Given that initial implementation of Ruby DI # does not support JRuby, we don't handle JRuby's lack of ordering # of #instance_variables here, but if JRuby is supported in the # future this may need to be addressed. ivars = value.instance_variables ivars.each do |ivar| if cur >= attribute_count serialized.update(notCapturedReason: "fieldCount", fields: fields) break end cur += 1 fields[ivar] = serialize_value(value.instance_variable_get(ivar), name: ivar, depth: depth - 1) end serialized.update(fields: fields) end end serialized rescue Exception => exc # standard:disable Lint/RescueException # Re-raise fatal exceptions that should not be caught # (signals, interrupts, system exit) raise if FATAL_EXCEPTION_CLASSES.any? { |klass| exc.is_a?(klass) } # Catch all other exceptions including SystemStackError and NoMemoryError. # These inherit from Exception (not StandardError) but can occur during # serialization (e.g., infinite recursion in custom serializers, memory # exhaustion from large objects) and should return a safe structure # rather than propagating to the transport layer. telemetry&.report(exc, description: "Error serializing") {type: class_name(cls), notSerializedReason: exc.to_s} end end |
#serialize_value_for_message(value, depth = 1) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
This method is used for serializing arbitrary values into log messages. Because the output is meant to be human-readable, we cannot use the “normal” serialization format which is meant to be machine-readable. Serialize objects with depth of 1 and include the class name.
Note that this method does not (currently) utilize the custom serializers that the “normal” serialization logic uses.
This serializer differs from the RFC in two ways:
-
We omit the middle of long strings rather than the end, and also the inner entries in arrays/hashes/objects.
-
We use Ruby-ish syntax for hashes and objects.
We also use the Ruby-like syntax for symbols, which don’t exist in other languages.
361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 |
# File 'lib/datadog/di/serializer.rb', line 361 def (value, depth = 1) # This method is more verbose than "normal" Ruby code to avoid # array allocations. case value when NilClass 'nil' when Integer, Float, TrueClass, FalseClass, Time, Date value.to_s when String (value) when Symbol ':' + (value) when Array return '...' if depth <= 0 max = if value.length > max value_ = value[0...max - 1] || [] value_ << '...' value_ << value[-1] value = value_ end '[' + value.map do |item| (item, depth - 1) end.join(', ') + ']' when Hash return '...' if depth <= 0 max = keys = value.keys truncated = false if value.length > max keys_ = keys[0...max - 1] || [] keys_ << keys[-1] keys = keys_ truncated = true end serialized = keys.map do |key| "#{serialize_value_for_message(key, depth - 1)} => #{serialize_value_for_message(value[key], depth - 1)}" end if truncated serialized[serialized.length] = serialized[serialized.length - 1] serialized[serialized.length - 2] = '...' end "{#{serialized.join(", ")}}" else return '...' if depth <= 0 vars = value.instance_variables truncated = false max = if vars.length > max vars_ = vars[0...max - 1] || [] vars_ << vars[-1] truncated = true vars = vars_ end serialized = vars.map do |var| # +var+ here is always the instance variable name which is a # symbol, we do not need to run it through our serializer. "#{var}=#{serialize_value_for_message(value.send(:instance_variable_get, var), depth - 1)}" end if truncated serialized << serialized.last serialized[-2] = '...' end serialized = if serialized.any? ' ' + serialized.join(' ') end "#<#{class_name(value.class)}#{serialized}>" end rescue => exc telemetry&.report(exc, description: "Error serializing for message") # TODO class_name(foo) can also fail, which we don't handle here. # Telemetry reporting could potentially also fail? "#<#{class_name(value.class)}: serialization error>" end |
#serialize_vars(vars, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: settings.dynamic_instrumentation.max_capture_attribute_count) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
Serializes variables captured by a line probe.
These are normally local variables that exist on a particular line of executed code.
140 141 142 143 144 145 146 |
# File 'lib/datadog/di/serializer.rb', line 140 def serialize_vars(vars, depth: settings.dynamic_instrumentation.max_capture_depth, attribute_count: settings.dynamic_instrumentation.max_capture_attribute_count) vars.each_with_object({}) do |(k, v), agg| agg[k] = serialize_value(v, name: k, depth: depth, attribute_count: attribute_count) end end |