Package-level declarations
Types
The results of an Action.
A task to execute on the completion of a job. See https://cloud.google.com/dlp/docs/concepts-actions to learn more.
Apply transformation to all findings.
Apply to all text.
Result of a risk analysis operation request.
An auxiliary table contains statistical information on the relative frequency of different quasi-identifiers values. It has one or several quasi-identifiers columns, and one column that indicates the relative frequency of each quasi-identifier tuple. If a tuple is present in the data but not in the auxiliary table, the corresponding relative frequency is assumed to be zero (and thus, the tuple is highly reidentifiable).
Message defining a field of a BigQuery table.
Options defining BigQuery table and row identifiers.
Message defining the location of a BigQuery table. A table is uniquely identified by its project_id, dataset_id, and table_name. Within a query a table is often referenced with a string in the format of: :.
or ..
.
Generalization function that buckets values based on ranges. The ranges and replacement values are dynamically provided by the user for custom behavior, such as 1-30 -> LOW 31-65 -> MEDIUM 66-100 -> HIGH This can be used on data of type: number, long, string, timestamp. If the bound Value
type differs from the type of data being transformed, we will first attempt converting the type of the data to be transformed to match the type of the bound before comparing. See https://cloud.google.com/dlp/docs/concepts-bucketing to learn more.
Bucket is represented as a range, along with replacement values.
Compute numerical stats over an individual column, including number of distinct values and value count distribution.
Histogram of value frequencies in the column.
Result of the categorical stats computation.
Partially mask a string by replacing a given number of characters with a fixed character. Masking can start from the beginning or end of the string. This can be used on data of any type (numbers, longs, and so on) and when de-identifying structured data we'll attempt to preserve the original data's type. (This allows you to take a long like 123 and modify it to a string like **3.
Characters to skip when doing deidentification of a value. These will be left alone and skipped.
Message representing a set of files in Cloud Storage.
Options defining a file or a set of files within a Cloud Storage bucket.
Message representing a single file or path in Cloud Storage.
Message representing a set of files in a Cloud Storage bucket. Regular expressions are used to allow fine-grained control over which files in the bucket to include. Included files are those that match at least one item in include_regex
and do not match any items in exclude_regex
. Note that a file that matches items from both lists will not be included. For a match to occur, the entire file path (i.e., everything in the url after the bucket name) must match the regular expression. For example, given the input {bucket_name: "mybucket", include_regex: ["directory1/.*"], exclude_regex: ["directory1/excluded.*"]}
: * gs://mybucket/directory1/myfile
will be included * gs://mybucket/directory1/directory2/myfile
will be included (.*
matches across /
) * gs://mybucket/directory0/directory1/myfile
will not be included (the full path doesn't match any items in include_regex
) * gs://mybucket/directory1/excludedfile
will not be included (the path matches an item in exclude_regex
) If include_regex
is left empty, it will match all files by default (this is equivalent to setting include_regex: [".*"]
). Some other common use cases: * {bucket_name: "mybucket", exclude_regex: [".*\.pdf"]}
will include all files in mybucket
except for .pdf files * {bucket_name: "mybucket", include_regex: ["directory/[^/]+"]}
will include all files directly under gs://mybucket/directory/
, without matching across /
Represents a color in the RGB color space.
The field type of value
and field
do not need to match to be considered equal, but not all comparisons are possible. EQUAL_TO and NOT_EQUAL_TO attempt to compare even with incompatible types, but all other comparisons are invalid with incompatible types. A value
of type: - string
can be compared against all other types - boolean
can only be compared against other booleans - integer
can be compared against doubles or a string if the string value can be parsed as an integer. - double
can be compared against integers or a string if the string can be parsed as a double. - Timestamp
can be compared against strings in RFC 3339 date string format. - TimeOfDay
can be compared against timestamps and strings in the format of 'HH:mm:ss'. If we fail to compare do to type mismatch, a warning will be given and the condition will evaluate to false.
A collection of conditions.
Pseudonymization method that generates deterministic encryption for the given input. Outputs a base64 encoded representation of the encrypted output. Uses AES-SIV based on the RFC https://tools.ietf.org/html/rfc5297.
Pseudonymization method that generates surrogates via cryptographic hashing. Uses SHA-256. The key size must be either 32 or 64 bytes. Outputs a base64 encoded representation of the hashed output (for example, L7k0BHmF1ha5U3NfGykjro4xWi1MPVQPjhMAZbSV9mM=). Currently, only string and integer values can be hashed. See https://cloud.google.com/dlp/docs/pseudonymization to learn more.
This is a data encryption key (DEK) (as opposed to a key encryption key (KEK) stored by Cloud Key Management Service (Cloud KMS). When using Cloud KMS to wrap or unwrap a DEK, be sure to set an appropriate IAM policy on the KEK to ensure an attacker cannot unwrap the DEK.
Replaces an identifier with a surrogate using Format Preserving Encryption (FPE) with the FFX mode of operation; however when used in the ReidentifyContent
API method, it serves the opposite function by reversing the surrogate back into the original identifier. The identifier must be encoded as ASCII. For a given crypto key and context, the same identifier will be replaced with the same surrogate. Identifiers must be at least two characters long. In the case that the identifier is the empty string, it will be skipped. See https://cloud.google.com/dlp/docs/pseudonymization to learn more. Note: We recommend using CryptoDeterministicConfig for all use cases which do not require preserving the input alphabet space and size, plus warrant referential integrity.
Custom information type provided by the user. Used to find domain-specific sensitive information configurable to the data in question.
Options defining a data set within Google Cloud Datastore.
Shifts dates by random number of days, with option to be consistent for the same context. See https://cloud.google.com/dlp/docs/concepts-date-shifting to learn more.
The configuration that controls how the data will change.
The results of a Deidentify action from an inspect job.
Summary of what was modified during a transformation.
Create a de-identified copy of the requested table or files. A TransformationDetail will be created for each transformation. If any rows in BigQuery are skipped during de-identification (transformation errors or row size exceeds BigQuery insert API limits) they are placed in the failure output table. If the original row exceeds the BigQuery insert API limit it will be truncated when written to the failure output table. The failure output table can be set in the action.deidentify.output.big_query_output.deidentified_failure_output_table field, if no table is set, a table will be automatically created in the same project and dataset as the original table. Compatible with: Inspect
DeidentifyTemplates contains instructions on how to de-identify content. See https://cloud.google.com/dlp/docs/concepts-templates to learn more.
δ-presence metric, used to estimate how likely it is for an attacker to figure out that one given individual appears in a de-identified dataset. Similarly to the k-map metric, we cannot compute δ-presence exactly without knowing the attack dataset, so we use a statistical model instead.
A DeltaPresenceEstimationHistogramBucket message with the following values: min_probability: 0.1 max_probability: 0.2 frequency: 42 means that there are 42 records for which δ is in [0.1, 0.2). An important particular case is when min_probability = max_probability = 1: then, every individual who shares this quasi-identifier combination is in the dataset.
A tuple of values for the quasi-identifier columns.
Result of the δ-presence computation. Note that these results are an estimation, not exact values.
Deprecated; use InspectionRuleSet
instead. Rule for modifying a CustomInfoType
to alter behavior under certain circumstances, depending on the specific details of the rule. Not supported for the surrogate_type
custom infoType.
Custom information type based on a dictionary of words or phrases. This can be used to match sensitive information specific to the data, such as a list of employee IDs or job titles. Dictionary words are case-insensitive and all characters other than letters and digits in the unicode Basic Multilingual Plane will be replaced with whitespace when scanning for matches, so the dictionary phrase "Sam Johnson" will match all three phrases "sam johnson", "Sam, Johnson", and "Sam (Johnson)". Additionally, the characters surrounding any match must be of a different type than the adjacent characters within the word, so letters must be next to non-letters and digits next to non-digits. For example, the dictionary word "jen" will match the first three letters of the text "jen123" but will return no matches for "jennifer". Dictionary words containing a large number of characters that are not letters or digits may result in unexpected findings because such characters are treated as whitespace. The limits page contains details about the size limits of dictionaries. For dictionaries that do not fit within these constraints, consider using LargeCustomDictionaryConfig
in the StoredInfoType
API.
An entity in a dataset is a field or set of fields that correspond to a single person. For example, in medical records the EntityId
might be a patient identifier, or for financial records it might be an account identifier. This message is used when generalizations or analysis must take into account that multiple rows correspond to the same entity.
Details information about an error encountered during job execution or the results of an unsuccessful activation of the JobTrigger.
The rule to exclude findings based on a hotword. For record inspection of tables, column names are considered hotwords. An example of this is to exclude a finding if it belongs to a BigQuery column that matches a specific pattern.
List of excluded infoTypes.
The rule that specifies conditions when findings of infoTypes specified in InspectionRuleSet
are removed from results.
An expression, consisting of an operator and conditions.
General identifier of a data field in a storage service.
The transformation to apply to the field.
Set of files to scan.
Configuration to control the number of findings returned for inspection. This is not used for de-identification or data profiling. When redacting sensitive data from images, finding limits don't apply. They can cause unexpected or inconsistent results, where only some data is redacted. Don't include finding limits in RedactImage requests. Otherwise, Cloud DLP returns an error.
Buckets values based on fixed size ranges. The Bucketing transformation can provide all of this functionality, but requires more configuration. This message is provided as a convenience to the user for simple bucketing strategies. The transformed value will be a hyphenated string of {lower_bound}-{upper_bound}. For example, if lower_bound = 10 and upper_bound = 20, all values that are within this bucket will be replaced with "10-20". This can be used on data of type: double, long. If the bound Value type differs from the type of data being transformed, we will first attempt converting the type of the data to be transformed to match the type of the bound before comparing. See https://cloud.google.com/dlp/docs/concepts-bucketing to learn more.
The rule that adjusts the likelihood of findings within a certain proximity of hotwords.
Statistics related to processing hybrid inspect requests.
Configuration to control jobs where the content being inspected is outside of Google Cloud Platform.
Configuration for determining how redaction of images should occur.
A type of transformation that is applied over images.
Max findings configuration per infoType, per content item or long running DlpJob.
Type of information detected by the API.
Statistics regarding a specific InfoType.
A transformation to apply to text that is identified as a specific info_type.
A type of transformation that will scan unstructured text and apply various PrimitiveTransformation
s to each finding, where the transformation is applied to only values that were identified as a specific info_type.
Configuration description of the scanning process. When used with redactContent only info_types and min_likelihood are currently used.
The results of an inspect DataSource job.
A single inspection rule to be applied to infoTypes, specified in InspectionRuleSet
.
Rule set for modifying a set of infoTypes to alter behavior under certain circumstances, depending on the specific details of the rules within the set.
Controls what and how to inspect for findings.
The inspectTemplate contains a configuration (set of types of sensitive data to be detected) to be used anywhere you otherwise would normally specify InspectConfig. See https://cloud.google.com/dlp/docs/concepts-templates to learn more.
Sends an email when the job completes. The email goes to IAM project owners and technical Essential Contacts.
k-anonymity metric, used for analysis of reidentification risk.
The set of columns' values that share the same ldiversity value
Histogram of k-anonymity equivalence classes.
Result of the k-anonymity computation.
A representation of a Datastore kind.
Reidentifiability metric. This corresponds to a risk model similar to what is called "journalist risk" in the literature, except the attack dataset is statistically modeled instead of being perfectly known. This can be done using publicly available data (like the US Census), or using a custom statistical model (indicated as one or several BigQuery tables), or by extrapolating from the distribution of values in the input dataset.
A KMapEstimationHistogramBucket message with the following values: min_anonymity: 3 max_anonymity: 5 frequency: 42 means that there are 42 records whose quasi-identifier values correspond to 3, 4 or 5 people in the overlying population. An important particular case is when min_anonymity = max_anonymity = 1: the frequency field then corresponds to the number of uniquely identifiable records.
A tuple of values for the quasi-identifier columns.
Result of the reidentifiability analysis. Note that these results are an estimation, not exact values.
Include to use an existing data crypto key wrapped by KMS. The wrapped key must be a 128-, 192-, or 256-bit key. Authorization requires the following IAM permissions when sending a request to perform a crypto transformation using a KMS-wrapped crypto key: dlp.kms.encrypt For more information, see Creating a wrapped key (https://cloud.google.com/dlp/docs/create-wrapped-key). Note: When you use Cloud KMS for cryptographic operations, charges apply.
Configuration for a custom dictionary created from a data source of any size up to the maximum size defined in the limits page. The artifacts of dictionary creation are stored in the specified Cloud Storage location. Consider using CustomInfoType.Dictionary
for smaller dictionaries that satisfy the size requirements.
Summary statistics of a custom dictionary.
l-diversity metric, used for analysis of reidentification risk.
The set of columns' values that share the same ldiversity value.
Histogram of l-diversity equivalence class sensitive value frequencies.
Result of the l-diversity computation.
Skips the data without modifying it if the requested transformation would cause an error. For example, if a DateShift
transformation were applied an an IP address, this mode would leave the IP address unchanged in the response.
Message for specifying an adjustment to the likelihood of a finding as part of a detection rule.
Job trigger option for hybrid jobs. Jobs must be manually created and finished.
Compute numerical stats over an individual column, including min, max, and quantiles.
Result of the numerical stats computation.
Cloud repository for storing output.
Datastore partition ID. A partition ID identifies a grouping of entities. The grouping is always by project and namespace, however the namespace ID may be empty. A partition ID contains several dimensions: project ID and namespace ID.
A rule for transforming a value.
Privacy metric to compute for reidentification risk analysis.
Message for specifying a window around a finding to apply a detection rule.
Publish findings of a DlpJob to Data Catalog. In Data Catalog, tag templates are applied to the resource that Cloud DLP scanned. Data Catalog tag templates are stored in the same project and region where the BigQuery table exists. For Cloud DLP to create and apply the tag template, the Cloud DLP service agent must have the roles/datacatalog.tagTemplateOwner
permission on the project. The tag template contains fields summarizing the results of the DlpJob. Any field values previously written by another DlpJob are deleted. InfoType naming patterns are strictly enforced when using this feature. Findings are persisted in Data Catalog storage and are governed by service-specific policies for Data Catalog. For more information, see Service Specific Terms. Only a single instance of this action can be specified. This action is allowed only if all resources being scanned are BigQuery tables. Compatible with: Inspect
Publish the result summary of a DlpJob to Security Command Center. This action is available for only projects that belong to an organization. This action publishes the count of finding instances and their infoTypes. The summary of findings are persisted in Security Command Center and are governed by service-specific policies for Security Command Center. Only a single instance of this action can be specified. Compatible with: Inspect
Publish a message into a given Pub/Sub topic when DlpJob has completed. The message contains a single field, DlpJobName
, which is equal to the finished job's DlpJob.name
. Compatible with: Inspect, Risk
Enable Stackdriver metric dlp.googleapis.com/finding_count. This will publish a metric to stack driver on each infotype requested and how many findings were found for it. CustomDetectors will be bucketed as 'Custom' under the Stackdriver label 'info_type'.
A quasi-identifier column has a custom_tag, used to know which column in the data corresponds to which column in the statistical model.
A quasi-identifier column has a custom_tag, used to know which column in the data corresponds to which column in the statistical model.
A column with a semantic tag attached.
A condition for determining whether a transformation should be applied to a field.
Configuration to suppress records whose suppression conditions evaluate to true.
A type of transformation that is applied over structured data such as a table.
Redact a given value. For example, if used with an InfoTypeTransformation
transforming PHONE_NUMBER, and input 'My phone number is 206-555-0123', the output would be 'My phone number is '.
Message defining a custom regular expression.
Replace each input value with a value randomly selected from the dictionary.
Replace each input value with a given Value
.
Replace each matching finding with the name of the info_type.
De-id options.
Snapshot of the inspection configuration.
Risk analysis options.
All result fields mentioned below are updated while the job is processing.
Configuration for a risk analysis job. See https://cloud.google.com/dlp/docs/concepts-risk-analysis to learn more.
If set, the detailed findings will be persisted to the specified OutputStorageConfig. Only a single instance of this action can be specified. Compatible with: Inspect, Risk
Schedule for inspect job triggers.
Apply transformation to the selected info_types.
Score is calculated from of all elements in the data profile. A higher level means the data is more sensitive.
An auxiliary table containing statistical information on the relative frequency of different quasi-identifiers values. It has one or several quasi-identifiers columns, and one column that indicates the relative frequency of each quasi-identifier tuple. If a tuple is present in the data but not in the auxiliary table, the corresponding relative frequency is assumed to be zero (and thus, the tuple is highly reidentifiable).
Shared message indicating Cloud storage type.
Configuration for stored infoTypes. All fields and subfield are provided by the user. For more information, see https://cloud.google.com/dlp/docs/creating-custom-infotypes.
Statistics for a StoredInfoType.
Version of a StoredInfoType, including the configuration used to build it, create timestamp, and current state.
A reference to a StoredInfoType to use with scanning.
Message for detecting output from deidentification transformations such as CryptoReplaceFfxFpeConfig
. These types of transformations are those that perform pseudonymization, thereby producing a "surrogate" as output. This should be used in conjunction with a field on the transformation such as surrogate_info_type
. This CustomInfoType does not support the use of detection_rules
.
Instructions regarding the table content being inspected.
A column with a semantic tag attached.
Throw an error and fail the request when a transformation error occurs.
For use with Date
, Timestamp
, and TimeOfDay
, extract or preserve a portion of the value.
Configuration of the timespan of the items to include in scanning. Currently only supported when inspecting Cloud Storage and BigQuery.
User specified templates and configs for how to deidentify structured, unstructures, and image files. User must provide either a unstructured deidentify template or at least one redact image config.
Config for storing transformation details.
How to handle transformation errors during de-identification. A transformation error occurs when the requested transformation is incompatible with the data. For example, trying to de-identify an IP address using a DateShift
transformation would result in a transformation error, since date info cannot be extracted from an IP address. Information about any incompatible transformations, and how they were handled, is returned in the response as part of the TransformationOverviews
.
Use this to have a random data crypto key generated. It will be discarded after the request finishes.
What event needs to occur for a new job to be started.
Using raw keys is prone to security risks due to accidentally leaking the key. Choose another type of key if possible.
A value of a field, including its frequency.
Set of primitive values supported by the system. Note that for the purposes of inspection or transformation, the number of bytes considered to comprise a 'Value' is based on its representation as a UTF-8 encoded string. For example, if 'integer_value' is set to 123456789, the number of bytes would be counted as 9, even though an int64 only holds up to 8 bytes of data.
Message defining a list of words or phrases to search for in the data.
A generic empty message that you can re-use to avoid defining duplicated empty messages in your APIs. A typical example is to use it as the request or the response type of an API method. For instance: service Foo { rpc Bar(google.protobuf.Empty) returns (google.protobuf.Empty); }
The Status
type defines a logical error model that is suitable for different programming environments, including REST APIs and RPC APIs. It is used by gRPC. Each Status
message contains three pieces of data: error code, error message, and error details. You can find out more about this error model and how to work with it in the API Design Guide.
Represents a whole or partial calendar date, such as a birthday. The time of day and time zone are either specified elsewhere or are insignificant. The date is relative to the Gregorian Calendar. This can represent one of the following: * A full date, with non-zero year, month, and day values. * A month and day, with a zero year (for example, an anniversary). * A year on its own, with a zero month and a zero day. * A year and month, with a zero day (for example, a credit card expiration date). Related types: * google.type.TimeOfDay * google.type.DateTime * google.protobuf.Timestamp
Represents a time of day. The date and time zone are either not significant or are specified elsewhere. An API may choose to allow leap seconds. Related types are google.type.Date and google.protobuf.Timestamp
.