4.4.11 Worksheet Query - Quick Start

To open Worksheet Query dialog, you can activate the worksheet and choose menu Worksheet: Worksheet Query. This dialog does row-wise data extraction based on user specified conditions.

Simple Tutorial

Extract Worksheet Data 05.png
Note:For the columns' information listed horizontally in the left panel, you can right-click any place within this panel to select the items you desired to show, such as Column Index, Short Name or Long Name of columns.
  1. Use the Import Wizard to import the file \Samples\Data Manipulation\US Metropolitan Area Population.dat. The destination worksheet contains four columns: Population, Sq.Mi., Density, and Metropolitan Area.
  2. Select Worksheet: Worksheet Query from the Origin menu to open Worksheet Query dialog box.
  3. With the ctrl key down, select Density and Metropolitan Area in the Columns: panel. Click the => button to make these columns available in Select Column Variables for If Test panel for constructing the condition. Note that aliases D and M are assigned to the columns respectively.
  4. Clear the check box before Sq.Mi. in the Columns: panel so that this column will not be included in the extracted result.
  5. Enter the following condition formula in the Condition box in Condition: panel. Note: You can click the AND button instead of typing out A-N-D:
    D > 100 AND M == "* CA*" or D > 100 AND M = "* CA*"
    This condition looks for densities above 100 for all areas in California ("CA"). Note that this tool supports wildcard symbols such as * inside strings.
  6. Click the Test -- select if true button. Notice that the number of found rows under the Condition box has been updated, and some rows in the worksheet are highlighted.
  7. Select Auto from Recalculate drop-down list to auto update the results if data changed.
  8. Select the Extract to New Worksheet radio button in Output: panel and then click the OK button. You see that the extracted results are stored in a new worksheet in the source workbook.

What Can I Enter in Condition Text Box?

You can use column aliases to construct conditions using simple operators such as >, <=, !=, etc., arithmetic operators (+, -, *, /, ^) and && (AND) || (OR) logical operators. You can also insert built-in functions from a number of categorized lists accessed from menu Function to build your query condition or create an user defined function.

Numeric Data
Examples Descriptions
Temperature > 0 To extract rows with values in column with alias Temperature that are greater than 0
N^2 != F To extract rows with the square of values in column N (alias) not equal to values in column F (alias).
A != -- (or A != NANUM) To extract rows with no missing values and no text strings in column A (alias).
mod(i, 3) == 0 To extract every 3rd row for all selected rows using mod() function, where i is the row index variable.
String Data
Examples Descriptions
M == "*CA*" (or M = "*ca*") To extract rows contain "CA" within the string in a cell in column with alias "M". Note wildcard symbols such as * (any string of characters), ? (any single character) are supported and string search is not case-sensitive.
Col(Station)[2*i-1]$ == "Northwest" To extract every 3rd row with string value equal to "Northwest" in column with Short/Long Name "Station".

OR A == "--"
OR A == ""

To highlight rows contain numeric data, string "--" and empty space in column with alias "A". Use Enter key to put expressions into multiple lines.
isText(B) To extract rows that only contains text string in column with alias "B" using isText() function.
Date Data
Examples Descriptions
frac(B) > Time(10:00:00) AND frac(B) < Time(11:00:00) To extract data within time period 10:00 ~ 11:00 in column B using frac() and Time() functions.
int(A) > Date(01/24/2004) To extract data later than date 2004/01/24 in column A using int() and Date() functions.

Advanced Scripting Controls on Data Extraction

While its usage is optional, the Script Before If Condition box accessed from menu Scripts:Edit... increases the power of the Worksheet Query dialog. This box takes one or more lines of LabTalk script and runs the script before the expression in the Condition box is executed. This makes possible such things as pre-processing of input data, defining functions, variables and constants for use in the Condition box, etc.

Origin actually runs the data query routine as follows:

  1. Run the Script Before Select Data Loop.
  2. Loop over all rows defined by the start and end values provided by user. For each column whose row index is i:
    1. Run the Script Before If Condition;
    2. Test the condition, for data in row i;
    3. Extract data if the condition is true;
    4. End query;
  3. End loop;

To show the query running routine, you can run the example below which is used to extract data in column A using indices in column B.

  1. Enter commands as shown below in Script Before Select Data Loop, Script Before If Condition and Condition: text boxes as shown in figure below.
  2. Click Test--select if true button and Origin will highlight row 1, 4, and 7 as illustrated.
Extract Worksheet Data 08.png

Quick Examples

Delete Rows with Missing Values

  1. Use Extract Worksheet Data 2.png button to add column "Time" to Select Column Variables for If Test box.
  2. Input T == NANUM (NANUM denotes missing values) in Condition box.
  3. Click Test--select if true button to highlight rows with missing values in column "Time".
  4. Right click on any of the highlighted rows and select Delete from shortcut menu.
Extract Worksheet Data 09.png

Extract Every Nth Row

  1. Input mod(i, 3) == 0 in Condition: text box where i is built-in row index variable and mod(m,n) returns the integer modulus of m divided by n.
  2. Click Test--select if true button to highlight and OK button to extract to new workbook.
Extract Worksheet Data 12.png

Mask Outliers

This example is to show how to mask outliers using 25th and 75th percentiles.
1. Enter commands below in Script Before Select Data Loop text box.

pct = {25, 75};
Q1 = percentile(col(B),pct)[1];
Q3 = percentile(col(B),pct)[2];
IQR = Q3 - Q1;
UIF = Q3 + 1.5*IQR;
LIF = Q1 - 1.5*IQR;

2. Enter Y < LIF OR Y > UIF in Condition: text box.
3. Click Test--select if true button to highlight found outliers.
4. Right click on one of the highlighted rows and select Mask: Apply from appeared context menu.

Extract Worksheet Data 11.png

See Also