# Filter results based on their data
The steps “Remove results which do not contain certain words” and “Remove results which contain certain words” let you to filter data such as scrape results or the contents of a Google Sheet. These steps work by checking the data for the given words and removing the whole row depending on whether the words are found or not.
To use these steps:
- In the “Data” field, select the data you wish to remove rows from.
- In the “Words” field, enter either a list of any number of words to check for, separated by commas, or data containing a list of words. Depending on the step you are using, any row that does not contain or contains the given word will be removed.
- Specify the “Word matching mode": when selecting “Any”, if any of the specified words are found/not found in a result, that result will be removed/retained. For instance, if you specify "apple, banana", any row containing either "apple" or "banana" (or both) will be excluded when using “Remove results containing specified words”. Selecting "All" means that a row will only be removed or filtered out if it contains all of the specified words. Following the same example, a row would only be excluded if it contains both "apple" and "banana".
- Check “Match on word boundary” to match only when the complete word appears in the data. Only the characters a-z A-Z 0-9 and _ are considered to be part of a word, all other characters are considered as being part of a word boundary.
- Optionally, you can specify Columns which need to be checked. When this option is enabled, only these columns will be checked.
# Remove Duplicates
Use the "Remove Duplicates" step to deduplicate your data. If the duplicate is found, the filter function will exclude that result from the final output.
This step will only remove a duplicate if the entire column is duplicated. To check for duplicates only in particular columns, toggle on “Columns to check” and enter the letters of the columns you wish to check, separated by commas. If all the specified columns match, the row will be removed.
# Replace data or text
The "Replace text" step in a data is used to substitute specific text or data values within your dataset with other values:
- The first step is to specify the text or data you want to replace. This can be a specific string or custom data.
- Next, you provide the text or data that will replace the identified targets. This can be a static value or a dynamic value based on your custom data, or even an empty string if you want to remove the target text.
- The Axiom will then scan through your data or text, looking for instances of the target. Each time it finds a match, it substitutes the target with the replacement value.
This operation can be extremely helpful in various scenarios, such as data cleaning, transforming data formats, redacting sensitive information, correcting errors, and so on.
# Split data into parts
Splitting text allows you to break up a string of text into smaller pieces, usually based on a specific character or word.
Here's a simple explanation of how to use the "split text" operation:
- Specify the delimiter. The delimiter is the character or word that separates the pieces of data in your text.
- When you run your axiom, the system will scan through your text, looking for instances of the delimiter. Each time it finds one, it "splits" the text into separate pieces at that point. The delimiter will be deleted from the data and the data will be split into separate columns based on the delimiter.
- Once your data is split, you can perform further operations with the output results based on your requirements.
# Append or merge data
“Append or Merge data” step allows merging two pieces of data together by appending one to the other.
You can append either horizontally (which adds the columns from data set B to the right of the columns in data set A) or vertically (which appends the rows from data set B to the bottom of data set A).
For example, let’s take two datasets, A and B.
If we merged this dataset horizontally, the output would then be:
Whereas if we merged it vertically, it would instead be:
Depending on the type of data in the two variables, a horizontal or vertical append will make more or less sense. In the example above, the horizontal append is more natural.
To append data:
- Specify data to be appended to
- Choose the data to append
- Select whether to append the data horizontally or vertically
# Select random rows from data
"Select random rows" step allows creating a random sample from the data. Simply decide how many random rows you want to select from your dataset and enter it into the “Rows” field:
# Split full names into separate parts
Takes any full name and splits it into its parts - title, first name, last name and additional names:
- Select the data you want to manipulate.
- Specify the column that contains the names. You can do this by entering either a number (where counting starts from 1) or a capital letter (starting from A), depending on how your data is structured. Only one column should be specified, as the operation will only use the first column provided.
- Specify the Name Components to Split: Select the components into which the name should be split: Title, First Name, Last Name, and Other Names. Each of these components will be split into a separate column in the order in which these elements are selected.
# Transpose rows and columns
Use the step “Swap rows and columns” to swap (transpose) the rows and columns in your data. Rows become columns and columns become rows.
# Join different data sources based on a matching column
You can combine different data collected from various sources by using a common column to merge the two sets using the “Join different data sources based on matching column” step.
This is useful when you have two unordered sets of data that you need to piece together into a complete, single dataset - it works just like a Join in a database.
Given two sources of data, such as a Google sheet:
And scraped data from the “Get data from a webpage” step:
The “Join different data sources based on matching column” step will merge these data sources:
To merge different data sources:
- Click the “Add step” button
- Base data: Enter the base data to join with another
- Base data column: Specify the column of data to use as the basis of the join. For example, if you want to join based on company name, enter the column number or letter which contains the company name here.
- Join data: Enter the data you want to join with the base data.
- Column: Specify a particular column of the base data to always match against. Leave this blank to match with any column.
- Fuzzyness of the match: Ranges from 0 to 1. 0 indicates that the match must be exact, and 1 indicates that anything will match. It's recommended to start with a low value here.
# Count rows
As its name suggests, this step will return the number of rows in a piece of data.
The main use case for this step is to pass into one of the Google Sheet steps, which will allow you to precisely specify the number of rows you wish to write, clear or delete based on the number of results in some other step (for example, scrape results).