Are you keen to get started in the exciting world of data, but find the terminology difficult to digest? Don’t stress, we have the perfect blog to introduce you to all the relevant jargon! Our data analysis glossary gives you an alphabetised list of common technical terms that all data apprentices will need to know.
A
Abstracting – Hiding specific data to only allow relevant data to be seen.
Aggregating – The process of collecting and presenting data in a summarised format to analyse statistics in order to effectively achieve business objectives.
Algorithms – A sequence of instructions that tell a computer how to transform a set of facts into useful information. A method for solving a problem.
Analysis Management – The process of managing and organising the processing of raw data.
Analysis Techniques – The process of inspecting, cleansing, changing and showing data to discover useful information which can be used for decision making. There are lots of different methods and techniques you can use.
ANOVA – The analysis of variance, a method to determine if the mean of groups is different.
Apriori Algorithm – A classic algorithm that is useful when calculating the association rules between objects. This is often used in retail when looking at which items are bought alongside one another.
Augmented Decision Making – The process of using technology and analytics, combined with a specific business process, to let the machine do the work of combing through the data, seeing patterns and making recommendations.
B
Blockchain Technologies – A chain of blocks that contain information. The data inside a block depends on the type of blockchain. The first block in the chain is called the Genesis block.
C
Classification – The process of analysing structured/unstructured data and organising it into categories based on file type, contents, and other metadata.
Continuous Variable – When a variable can take on any value between its minimum value and its maximum value.
Cross Checking Techniques – This could also be called Triangulation. This is the process of validating data from two or more sources to ensure consistency.
D
Data Architecture – Data architecture is made up of models, policies, rules, or standards that govern which data is collected, and how it is stored, arranged, integrated, and put to use in data systems and organisations.
Data Blending -Data blending is a process where data from multiple sources are merged into a single data set. This is also known as data compilation.
Data Cleaning -The process of detecting, correcting, or removing corrupt and inaccurate records. This is also known as data cleansing.
Data Compliance – A method that ensures data is organised and managed in a manner that meets organisational and government guidelines.
Data Insights – Knowledge gained from looking at specific data to highlight patterns and trends.
Data Lifecycle – The data lifecycle starts with creating data, then to storage, use, sharing, archiving, and finally destruction. Storage time prior to destruction differs depending on industry standards.
Data Linking – Data that is linked together normally using something such as SQL. Code will be written to pull data from two or more sources into one location.
Data Migration – The process of moving data between locations, formats, or applications.
Data Mining – Extraction of specific data in large data sets to look for patterns and trends.
Data Modelling – There are three data models: conceptual, logical, and physical. Once these are in place, different techniques can be used for analysis.
Data Structure – Organisation of data in a specific format to be easily accessed and used for specific purposes.
Data Validation – This is the process of ensuring data has been checked and cleansed prior to analysis.
Data Visualisation – The visual representation of data, mainly used in infographics.
Data Intelligence – An engineering discipline that augments data science with theory from social science, decision theory, and managerial science.
Discrete Variable – Countable in a finite amount of time. You can count the change in your pocket or the money in your bank account.
Dynamic Data – Real time data that will change and update 24/7, this is updated through machine learning.
E
Extract Data – Data extraction is a process that involves retrieval of data from various sources in order to process further.
G
General Linear Model – A framework used to compare how several variables will affect a continuous variable. There are seven statistical tests within this model: ANOVA, MANOVA, MANCOVA, Ordinary Linear Regression, T Test, and F Test.
Generalised Linear Model – This is an extension of the GLM, however it is more complex as it will also make predictions. This is used when there is a constant change in the variable.
I
Item Response Theory – Refers to a family of mathematical models that attempt to explain the difference between latent traits.
L
Linear Regression – An approach that models the relationship between dependent and independent variables.
M
Machine Learning – An application of AI, the system has the ability to improve and learn without being continually programmed.
Manipulate – Making changes to data, this can be deleting, editing, adjusting, and cleaning.
MANOVA – The analysis of measuring several variables in one test. This means it will increase the probability of finding an important factor.
Metadata – Metadata summarises basic information about data, which can make finding and working with data easier. Some examples of metadata are author, date created, date modified, and file size.
Misclassification – This happens when the property selection is not suitable for the classification, such as the wrong data group.
Multiple Queries – A query that is written to pull set data from the data store on multiple occasions as data would normally be pulled once.
O
Obsolete Data – Incorrect data, or data that is no longer used due to being out of date.
Open Data Sets – Open data are large datasets that are available to anyone with an internet connection. This type of data stems from external sources around the world. It can be anything from public data collected by government agencies to economic trend roundups from banks and financial conglomerates.
P
Population Mean – This is the average of a group characteristic such as the average age of all the people in the UK.
Q
Qualitative Data – Data that is non numerical, it doesn’t have to be highly detailed. For example, in a questionnaire, some of the information you provide may be simple, but in a way that is not numerical such as male or female.
Quantitative Data – Data that contains a number, such as time, date, age, etc.
Query – A request for data or information.
Query Containment – One query stored within another, both will give different information, the first query is a subset of the second query.
Query Processing – Translating high level queries into low level expressions, requiring the basic concepts of basic algebra.
R
Raw Data – Data pulled directly from the system prior to being manipulated in any way.
S
Single Query – A query that is written in order to return a set result.
Social Network Analytics – The collected information from social networks that show how users share, view or engage with your content or profiles.
Statistical Analysis – The collection and interpretation of data in order to uncover patterns and trends.
Statistical Methods – Different mathematical formulas, models and techniques used when researching raw data? This will extract specific data to allow analysis to take place.
Structural Equation Modelling – A technique used to analyse structural relationships between measured variables.
Supervised Algorithm – Where you tell the algorithm what the correct answer is, so the algorithm can predict answers for unseen data.
T
Time Series Analysis – Looking at data over a specific time frame to look for patterns and trends. For example, Baltic may look at the previous year’s starts, to see which months had higher inductions.
Time Series Forecasting – Links in with time series analysis. By using time series analysis, you can then forecast and make decisions based on the patterns in the analysis.
U
Unstructured Data – Data that is not organised and is harder to analyse, for example a free text field for survey comments.
Unsupervised Algorithm – Where the algorithm has to figure out the answer on its own.
V
Variety – Data that is not organised and is harder to analyse, for example a free text field for survey comments.
W
Web Analytics – The measurement, collection, analysis, and reporting of web data to further understand and optimise web usage.
Why is it Important To Know Data Analysis Technical Terms?
This is important because data analysts rely on precision, understanding the technical terminology can unlock fresh approaches to your analysis and can help you apply the most appropriate techniques to your work, alongside communicating with others in the industry more efficiently.
Find Out More:
Did you know that apprenticeships are a great way to take your data knowledge to the next level?
Our Level 3 Junior Data Analyst programme provides a great way to get started in a data role, or if you already have some experience in the industry, our Level 4 Data Analyst programme can help further your knowledge and career.