NeoSchema Reference Guide
This guide is for version 5.0.0-beta.48+
Source code
Background Information:
Using Schema in Graph Databases such as Neo4j
User Guide
Tutorial 1 : basic Schema operations (Classes, Properties, Data Nodes)
Tutorial 2 : set up a simple Schema (Classes, Properties) and perform a data import (Data Nodes and relationships among them)
A layer above the class NeoAccess (or, in principle, another library providing a compatible interface), to provide an optional schema to the underlying database. Schemas may be used to either: 1) acknowledge the existence of typical patterns in the data OR 2) to enforce a mold for the data to conform to MOTIVATION Relational databases are suffocatingly strict for the real world. Neo4j by itself may be too anarchic. A schema (whether "lenient/lax/loose" or "strict") in conjunction with Neo4j may be the needed compromise. GOALS - Data integrity - Data filtering upon import - Assist the User Interface - Self-documentation of the database - Graft into graph database some of the semantic functionality that some people turn to RDF for. However, carving out a new path rather than attempting to emulate RDF! OVERVIEW - "Class" nodes capture the abstraction of entities that share similarities. Example: "car", "star", "protein", "patient" In RDFS lingo, a "Class" node is the counterpart of a resource (entity) whose "rdf:type" property has the value "rdfs:Class" - The "Property" nodes linked to a given "Class" node, represent the attributes of the data nodes of that class - Data nodes are linked to their respective classes by a "SCHEMA" relationship. - Some classes contain an attribute named "code" that identifies the UI code to display/edit them [this might change!], as well as their descendants under the "INSTANCE_OF" relationships. Conceptually, the "code" is a relationship to an entity consisting of software code. - Class can be of the "S" (Strict) or "L" (Lenient) type. A "lenient" Class will accept data nodes with any properties, whether declared in the Class Schema or not; by contrast, a "strict" class will prevent data nodes that contains properties not declared in the Schema IMPLEMENTATION DETAILS - Every node used by this class, as well as the data nodes it manages, contains has a unique attribute "uri" (formerly "schema_id" and "item_id", respectively); note that this is actually a "token", i.e. a part of a URI - not a full URI. The uri's of schema nodes have the form "schema-n", where n is a unique number. Data nodes can have any unique uri's, with optional prefixes and suffixes chosen by the higher layers. The Schema layer manages the auto-increments for any desired set of namespaces (and itself makes use of the "schema_node" namespace) - The names of the Classes and Properties are stored in node attributes called "name". We also avoid calling them "label", as done in RDFS, because in Labeled Graph Databases like Neo4j, the term "label" has a very specific meaning, and is pervasively used. - For convenience, data nodes contain a label equal to their Class name AUTHOR: Julian West ---------------------------------------------------------------------------------- MIT License Copyright (c) 2021-2024 Julian A. West and the BrainAnnex.org project This file is part of the "Brain Annex" project (https://BrainAnnex.org) Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ----------------------------------------------------------------------------------
name | arguments | returns |
---|---|---|
set_database | db :NeoAccess | None |
IMPORTANT: this method MUST be called before using this class!! :param db: Database-interface object, created with the NeoAccess library :return: None |
name | arguments | returns |
---|---|---|
assert_valid_class_name | class_name: str | None |
Raise an Exception if the passed argument is not a valid Class name :param class_name: A string with the putative name of a Schema Class :return: None |
name | arguments | returns |
---|---|---|
is_valid_class_name | class_name: str | bool |
Return True if the passed argument is a valid Class name, or False otherwise :param class_name: A string with the putative name of a Schema Class :return: None |
name | arguments | returns |
---|---|---|
assert_valid_class_identifier | class_node :Union[int, str] | None |
Raise an Exception is the argument is not a valid "identifier" for a Class node, meaning either a valid name or a valid internal database ID :param class_node: Either an integer with the internal database ID of an existing Class node, or a string with its name :return: None (an Exception is raised if the validation fails) |
name | arguments | returns |
---|---|---|
create_class | name :str, code = None, strict = False, no_datanodes = False | (int, str) |
Create a new Class node with the given name and type of schema, provided that the name isn't already in use for another Class. Return a pair with internal database ID, and the auto-incremented uri, assigned to the new Class. Raise an Exception if a class by that name already exists. NOTE: if you want to add Properties at the same time that you create a new Class, use the function create_class_with_properties() instead. :param name: Name to give to the new Class :param code: Optional string indicative of the software handler for this Class and its subclasses :param strict: If True, the Class will be of the "S" (Strict) type; otherwise, it'll be of the "L" (Lenient) type Explained under the comments for the NeoSchema class :param no_datanodes If True, it means that this Class does not allow data node to have a "SCHEMA" relationship to it; typically used by Classes having an intermediate role in the context of other Classes :return: An (int, str) pair of integers with the internal database ID and the unique uri assigned to the node just created, if it was created; an Exception is raised if a class by that name already exists |
name | arguments | returns |
---|---|---|
get_class_internal_id | class_name :str | int |
Returns the internal database ID of the Class node with the given name, or raise an Exception if not found, or if more than one is found. Note: unique Class names are assumed. :param class_name: The name of the desired class :return: The internal database ID of the specified Class |
name | arguments | returns |
---|---|---|
get_class_uri | class_name :str | str |
Returns the Schema uri of the Class with the given name; raise an Exception if not found :param class_name: The name of the desired class :return: The Schema uri of the specified Class |
name | arguments | returns |
---|---|---|
get_class_uri_by_internal_id | internal_class_id: int | int |
Returns the Schema uri of the Class with the given internal database ID. :param internal_class_id: :return: The Schema ID of the specified Class; raise an Exception if not found |
name | arguments | returns |
---|---|---|
class_neo_id_exists | neo_id: int | bool |
Return True if a Class by the given internal database ID already exists, or False otherwise :param neo_id: Integer with internal database ID :return: A boolean indicating whether the specified Class exists |
name | arguments | returns |
---|---|---|
class_uri_exists | schema_uri :str | bool |
Return True if a Class by the given uri already exists, or False otherwise :param schema_uri: The uri of the Class node of interest :return: True if the Class already exists, or False otherwise |
name | arguments | returns |
---|---|---|
class_name_exists | class_name: str | bool |
Return True if a Class by the given name already exists, or False otherwise :param class_name: The name of the class of interest :return: True if the Class already exists, or False otherwise |
name | arguments | returns |
---|---|---|
get_class_name_by_schema_uri | schema_uri :str | str |
Returns the name of the class with the given Schema URI; raise an Exception if not found :param schema_uri: A string uniquely identifying the desired Class :return: The name of the Class with the given Schema uri |
name | arguments | returns |
---|---|---|
get_class_name | internal_id: int | str |
Returns the name of the class with the given internal database ID, or raise an Exception if not found :param internal_id: An integer with the internal database ID of the desired class :return: The name of the class with the given Schema ID; raise an Exception if not found |
name | arguments | returns |
---|---|---|
get_class_attributes | class_internal_id: int | dict |
Returns all the attributes (incl. the name) of the Class node with the given internal database ID, or raise an Exception if the Class is not found. If no "name" attribute is found, an Exception is raised. :param class_internal_id: An integer with the Neo4j ID of the desired class :return: A dictionary of attributed of the class with the given Schema ID; an Exception is raised if not found EXAMPLE: {'name': 'MY CLASS', 'uri': '123', 'strict': False} |
name | arguments | returns |
---|---|---|
get_all_classes | only_names=True | [str] |
Fetch and return a list of all the existing Schema classes - either just their names (sorted alphabetically) (TODO: or a fuller listing - not yet implemented) TODO: disregard capitalization in sorting :return: A list of all the existing Class names |
name | arguments | returns |
---|---|---|
rename_class | old_name :str, new_name :str, rename_data_fields=True | None |
Rename the specified Class. If the Class is not found, an Exception is raised :param old_name: The current name (to be changed) of the Class of interest :param new_name: The new name to give to the above Class :param rename_data_fields: If True (default), the corresponding label in the data nodes of that Class is renamed as well :return: None |
name | arguments | returns |
---|---|---|
delete_class | name: str, safe_delete=True | None |
Delete the given Class AND all its attached Properties. If safe_delete is True (highly recommended), then delete ONLY if there are no data nodes of that Class (i.e., linked to it by way of "SCHEMA" relationships.) :param name: Name of the Class to delete :param safe_delete: Flag indicating whether the deletion is to be restricted to situations where no data node would be left "orphaned". CAUTION: if safe_delete is False, then data nodes may be left without a Schema :return: None. In case of no node deletion, an Exception is raised |
name | arguments | returns |
---|---|---|
is_strict_class | name :str | bool |
Return True if the given Class is of "Strict" type, or False otherwise (or if the information is missing). If no Class by that name exists, an Exception is raised :param name: The name of a Schema Class node :return: True if the Class is "strict" or False if not (i.e., if it's "lax") |
name | arguments | returns |
---|---|---|
is_strict_class_fast | class_internal_id: int, schema_cache=None | bool |
Return True if the given Class is of "Strict" type, or False otherwise (or if the information is missing) :param class_internal_id: The internal ID of a Schema Class node :param schema_cache: (OPTIONAL) "SchemaCache" object :return: True if the Class is "strict" or False if not (i.e., if it's "lax") |
name | arguments | returns |
---|---|---|
allows_data_nodes | class_name = None, class_internal_id = None, schema_cache=None | bool |
Determine if the given Class allows data nodes directly linked to it :param class_name: Name of the Class :param class_internal_id :(OPTIONAL) Alternate way to specify the class; if both specified, this one prevails :param schema_cache: (OPTIONAL) "SchemaCache" object :return: True if allowed, or False if not If the Class doesn't exist, raise an Exception |
name | arguments | returns |
---|---|---|
assert_valid_relationship_name | rel_name :str | None |
Raise an Exception if the passed argument is not a valid name for a database relationship :param rel_name:A string with the relationship (link) name whose validity we want to check :return: None |
name | arguments | returns |
---|---|---|
create_class_relationship | from_class: Union[int, str], to_class: Union[int, str], rel_name="INSTANCE_OF", use_link_node=False, link_properties=None | None |
Create a relationship (provided that it doesn't already exist) with the specified name between the 2 existing Class nodes (identified by names or by their internal database IDs), in the ( from -> to ) direction. Note: multiple relationships by the same name between the same nodes are allowed by Neo4j, as long as the relationships differ in their attributes (but this method doesn't allow setting properties on the new relationship) :param from_class: Either an integer with the internal database ID of an existing Class node, or a string with its name. Used to identify the node from which the new relationship originates. :param to_class: Either an integer with the internal database ID of an existing Class node, or a string with its name. Used to identify the node to which the new relationship terminates. :param rel_name: Name of the relationship to create, in the from -> to direction (blanks allowed) :param use_link_node: If True, insert an intermediate "LINK" node in the newly-created relationship; otherwise, simply create a direct link. Note: if rel_name has the special value "INSTANCE_OF", this argument must be False :param link_properties: [OPTIONAL] List of Property names to attach to the newly-created link. Note: if link_properties is specified, then use_link_node is automatically True :return: None |
name | arguments | returns |
---|---|---|
rename_class_rel | from_class: int, to_class: int, new_rel_name | bool |
#### TODO: NOT IN CURRENT USE Rename the old relationship between the specified classes TODO: if more than 1 relationship exists between the given Classes, then they will all be replaced?? TO FIX! (the old name ought be provided) :param from_class: :param to_class: :param new_rel_name: :return: True if another relationship was found, and successfully renamed; otherwise, False |
name | arguments | returns |
---|---|---|
delete_class_relationship | from_class: str, to_class: str, rel_name | int |
Delete the relationship(s) with the specified name between the 2 existing Class nodes (identified by their respective names), going in the from -> to direction direction. In case of error or if no relationship was found, an Exception is raised Note: there might be more than one - relationships with the same name between the same nodes are allowed, provided that they have different properties. If more than one is found, they will all be deleted. The number of relationships deleted will be returned :param from_class: Name of one existing Class node (blanks allowed in name) :param to_class: Name of another existing Class node (blanks allowed in name) :param rel_name: Name of the relationship(s) to delete, if found in the from -> to direction (blanks allowed in name) :return: The number of relationships deleted. In case of error, or if no relationship was found, an Exception is raised |
name | arguments | returns |
---|---|---|
unlink_classes | class1 :Union[int, str], class2 :Union[int, str] | int |
Remove ALL relationships (in any direction) between the specified Classes :param class1: Either the integer internal database ID, or name, to identify the first Class :param class2: Either the integer internal database ID, or name, to identify the second Class :return: The number of relationships deleted (possibly zero) |
name | arguments | returns |
---|---|---|
class_relationship_exists | from_class: str, to_class: str, rel_name | bool |
Return True if a relationship with the specified name exists between the two given Classes, in the specified direction. The Schema allows several scenarios: - A direct relationship from one Class node to the other - A relationship that goes thru an intermediary "LINK" node - Either of the 2 above scenarios, but between "ancestors" of the two nodes; "ancestors" are defined by means of following any number of "INSTANCE_OF" hops to other Class nodes SEE ALSO: is_link_allowed() :param from_class: Name of an existing Class node (blanks allowed in name) :param to_class: Name of another existing Class node (blanks allowed in name) :param rel_name: Name of the relationship(s) to delete, if found in the from -> to direction (blanks allowed in name) :return: True if the Class relationship exists, or False otherwise |
name | arguments | returns |
---|---|---|
get_class_instances | class_name: str, leaf_only=False | [str] |
Get the names of all Classes that are, directly or indirectly, instances of the given Class, i.e. pointing to that node thru a series of 1 or more "INSTANCE_OF" relationships; if leaf_only is True, then only as long as they are leaf nodes (with no other Class that is an instance of them.) :param class_name: Name of the Class for which we want to find other Classes that are an instance of it :param leaf_only: If True, only return the leaf nodes (those that don't have other Classes that are instances of them) :return: A list of Class names |
name | arguments | returns |
---|---|---|
get_linked_class_names | class_name: str, rel_name: str, enforce_unique=False | Union[str, List[str]] |
Given a Class, specified by its name, locate and return the name(s) of the other Class(es) that it's linked to by means of the relationship with the specified name. Typically, the result will contain no more than 1 name, but it could be more; it's probably a bad design to use the same relationship name to connect a class to multiple other classes (though currently allowed.) Relationships are followed in the OUTbound direction only. :param class_name: Name of a Class in the schema :param rel_name: Name of relationship to follow (in the OUTbound direction) from the above Class :param enforce_unique: If True, it raises an Exception if the number of results isn't exactly one :return: If enforce_unique is True, return a string with the class name; otherwise, return a list of names (typically just one) |
name | arguments | returns |
---|---|---|
get_class_relationships | class_name :str, link_dir="BOTH", omit_instance=False | Union[dict, list] |
Fetch and return the names of all the relationships (both inbound and outbound) attached to the given Class. Treat separately the inbound and the outbound ones. If the Class doesn't exist, empty lists are returned. :param class_name: The name of the desired Class :param link_dir: Desired direction(s) of the relationships; one of "BOTH" (default), "IN" or "OUT" :param omit_instance: If True, the common outbound relationship "INSTANCE_OF" is omitted :return: If link_dir is "BOTH", return a dictionary of the form {"in": list of inbound-relationship names, "out": list of outbound-relationship names} Otherwise, just return the inbound or outbound list, based on the value of link_dir |
name | arguments | returns |
---|---|---|
get_class_outbound_data | class_neo_id :int, omit_instance=False | dict |
Efficient all-at-once query to fetch and return the names of all the outbound relationship attached to the given Class, as well as the names of the other Classes on the other side of those links. IMPORTANT: it's probably a bad design to use the same relationship name to connect a class to multiple other classes. Though currently allowed in the Schema, this particular method assumes - and enforces - uniqueness :param class_neo_id: An integer to identify the desired Class :param omit_instance: If True, the common outbound relationship "INSTANCE_OF" is omitted :return: A (possibly empty) dictionary, where the keys are the name of outbound relationships, and the values are the names of the Class nodes on the other side of those links. An Exception will be raised if link names are not unique [though currently allowed by the Schema] EXAMPLE: {'IS_ATTENDED_BY': 'doctor', 'HAS_RESULT': 'result'} |
name | arguments | returns |
---|---|---|
get_class_properties | class_node: Union[int, str], include_ancestors=False, sort_by_path_len="ASC", exclude_system=False | [str] |
Return the list of all the names of the Properties associated with the given Class (including those inherited thru ancestor nodes by means of "INSTANCE_OF" relationships, if include_ancestors is True), sorted by the schema-specified position (or, optionally, by path length) EXAMPLES: get_class_properties(class_node="Quote", include_ancestors=False) => ['quote', 'attribution', 'notes'] NeoSchema.get_class_properties(class_node="Quote", include_ancestors=True, exclude_system=False) => ['quote', 'attribution', 'notes', 'uri'] NeoSchema.get_class_properties(class_node="Quote", include_ancestors=True, sort_by_path_len="DESC", exclude_system=False) => ['uri', 'quote', 'attribution', 'notes'] NeoSchema.get_class_properties(class_node="Quote", include_ancestors=True, exclude_system=True) => ['quote', 'attribution', 'notes'] :param class_node: Either an integer with the internal database ID of an existing Class node, or a string with its name :param include_ancestors: If True, also include the Properties attached to Classes that are ancestral to the given one by means of a chain of outbound "INSTANCE_OF" relationships Note: the sorting by relationship index won't mean much if ancestral nodes are included, with their own indexing of relationships; if order matters in those cases, use the "sort_by_path_len" argument, below :param sort_by_path_len: Only applicable if include_ancestors is True. If provided, it must be either "ASC" or "DESC", and it will sort the results by path length (either ascending or descending), before sorting by the schema-specified position for each Class. Note: with "ASC", the immediate Properties of the given Class will be listed first :param exclude_system: [OPTIONAL] If True, Property nodes with the attribute "system" set to True will be excluded; default is False :return: A list of the Properties of the specified Class (including indirect Properties, if include_ancestors is True) |
name | arguments | returns |
---|---|---|
add_properties_to_class | class_node = None, class_uri = None, property_list = None | int |
Add a list of Properties to the specified (ALREADY-existing) Class. The properties are given an inherent order (an attribute named "index", starting at 1), based on the order they appear in the list. If other Properties already exist, the existing numbering gets extended. NOTE: if the Class doesn't already exist, use create_class_with_properties() instead; attempting to add properties to an non-existing Class will result in an Exception :param class_node: An integer with the internal database ID of an existing Class node, or a string with its name :param class_uri: (OPTIONAL) String with the schema_uri of the Class to which attach the given Properties TODO: remove :param property_list: A list of strings with the names of the properties, in the desired order. Whitespace in any of the names gets stripped out. If any name is a blank string, an Exception is raised If the list is empty, an Exception is raised :return: The number of Properties added |
name | arguments | returns |
---|---|---|
set_property_attribute | class_name :str, prop_name :str, attribute_name :str, attribute_value | None |
Set an attribute on an existing "PROPERTY" node of the specified Class EXAMPLES: set_property_attribute(class_name="Content Item", prop_name="uri", attribute_name="system", attribute_value=True) set_property_attribute(class_name="User", prop_name="admin", attribute_name="dtype", attribute_value="boolean") set_property_attribute(class_name="User", prop_name="user_id", attribute_name="dtype", attribute_value="integer") set_property_attribute(class_name="User", prop_name="username", attribute_name="required", attribute_value=True) :param class_name: The name of an existing CLASS node :param prop_name: The name of an existing PROPERTY node :param attribute_name: The name of an attribute (field) of the PROPERTY node :param attribute_value: The value to give to the above attribute (field) of the PROPERTY node; if a value was already set, it will be over-written :return: None |
name | arguments | returns |
---|---|---|
create_class_with_properties | name :str, properties :[str], code=None, strict=False, class_to_link_to=None, link_name="INSTANCE_OF", link_dir="OUT" | (int, str) |
Create a new Class node, with the specified name, and also create the specified Properties nodes, and link them together with "HAS_PROPERTY" relationships. Return the internal database ID and the auto-incremented unique ID ("scheme ID") assigned to the new Class. Each Property node is also assigned a unique "schema ID"; the "HAS_PROPERTY" relationships are assigned an auto-increment index, representing the default order of the Properties. If a class_to_link_to name is specified, link the newly-created Class node to that existing Class node, using an outbound relationship with the specified name. Typically used to create "INSTANCE_OF" relationships from new Classes. If a Class with the given name already exists, nothing is done, and an Exception is raised. NOTE: if the Class already exists, use add_properties_to_class() instead :param name: String with name to assign to the new class :param properties: List of strings with the names of the Properties, in their default order (if that matters) :param code: Optional string indicative of the software handler for this Class and its subclasses. :param strict: If True, the Class will be of the "Strict" type; otherwise, it'll be of the "Lenient" type :param class_to_link_to: If this name is specified, and a link_to_name (below) is also specified, then create an OUTBOUND relationship from the newly-created Class to this existing Class :param link_name: Name to use for the above relationship, if requested. Default is "INSTANCE_OF" :param link_dir: Desired direction(s) of the relationships: either "OUT" (default) or "IN" :return: If successful, the pair (internal database ID, string "schema_uri" assigned to the new Class); otherwise, raise an Exception |
name | arguments | returns |
---|---|---|
remove_property_from_class | class_uri :str, property_uri :str | None |
Take out the specified (single) Property from the given Class. If the Class or Property was not found, an Exception is raised :param class_uri: The uri of the Class node :param property_uri:The uri of the Property node :return: None |
name | arguments | returns |
---|---|---|
rename_property | old_name :str, new_name :str, class_name :str, rename_data_fields=True | None |
Rename the specified (single) Property from the given Class. If the Class or Property is not found, an Exception is raised :param old_name: The current name (to be changed) of the Property of interest :param new_name: The new name to give to the above Property :param class_name: The name of the Class node to which the Property is attached :param rename_data_fields: If True (default), the field names in the data nodes of that Class are renamed as well (NOT YET IMPLEMENTED) :return: None |
name | arguments | returns |
---|---|---|
is_property_allowed | property_name :str, class_name :str | bool |
Return True if the given Property is allowed by the specified Class, or False otherwise. For a Property to be allowed, at least one of the following must hold: A) the Class isn't strict (i.e. every property is allowed) OR B) the Property has been registered with the Schema, for that Class OR C) the Property has been registered with the Schema, for an ancestral Class - reachable from our given Class thru a chain of "INSTANCE_OF" relationships It's permissible for the specified Class not to exist; in that case, False will be returned (TODO: may be better to raise an Exception in such cases!) :param property_name: Name of a Property (i.e. a field name) whose permissibility we want to check :param class_name: Name of a Class in the Schema :return: True if the given Property is allowed by the specified Class, or False otherwise |
name | arguments | returns |
---|---|---|
is_link_allowed | link_name :str, from_class :str, to_class :str | bool |
Return True if the given Link is allowed between the specified Classes (in the given direction), or False otherwise. For a Link to be allowed, at least one of the following must hold: A) BOTH of the Classes aren't strict (in which case any arbitrary link is allowed!) OR B) the Link has been registered with the Schema, for those Classes (possibly going thru intermediate "INSTANCE_OF" hops) Note: links being allowed is inherited from other Classes that are ancestors of the given Class thru "INSTANCE_OF" relationships If either of the specified Classes doesn't exist, an Exception is raised :param link_name: Name of a Link (i.e. relationship) whose permissibility we want to check :param from_class: Name of a Class that we want to check whether the given Link can originate from :param to_class: Name of a Class that we want to check whether the given Link can terminate into :return: True if the given Link is allowed by the specified Classes, or False otherwise |
name | arguments | returns |
---|---|---|
allowable_props | class_internal_id: int, requested_props: dict, silently_drop: bool, schema_cache=None | dict |
If any of the properties in the requested list of properties is not a declared (and thus allowed) Schema property, then: 1) if silently_drop is True, drop that property from the returned pared-down list 2) if silently_drop is False, raise an Exception :param class_internal_id: The internal database ID of a Schema Class node :param requested_props: A dictionary of properties one wishes to assign to a new data node, if the Schema allows :param silently_drop: If True, any requested properties not allowed by the Schema are simply dropped; otherwise, an Exception is raised if any property isn't allowed :param schema_cache: (OPTIONAL) "SchemaCache" object :return: A possibly pared-down version of the requested_props dictionary |
name | arguments | returns |
---|---|---|
get_schema_code | class_name: str | str |
Obtain the "schema code" of a Class, specified by its name. The "schema code" is an optional but convenient text code, stored either on a Class node, or on any of its ancestors by way of "INSTANCE_OF" relationships :return: A string with the Schema code (empty string if not found) EXAMPLE: "i" |
name | arguments | returns |
---|---|---|
get_schema_uri | schema_code :str | str |
Get the Schema URI most directly associated to the given Schema Code :return: A string with the Schema uri (or "" if not present) |
name | arguments | returns |
---|---|---|
all_properties | label :str, primary_key_name :str, primary_key_value | [str] |
Return the list of the *names* of all the Properties associated with the specified DATA node, based on the Schema it is associated with, sorted their by schema-specified position. The desired node is identified by specifying which one of its attributes is a primary key, and providing a value for it. IMPORTANT : this function returns the NAMES of the Properties; not their values :param label: :param primary_key_name: A field name used to identify our desired Data Node :param primary_key_value: The corresponding field value to identify our desired Data Node :return: A list of the names of the Properties associated with the given DATA node |
name | arguments | returns |
---|---|---|
get_data_node_internal_id | uri :str, label=None | int |
Returns the internal database ID of the given Data Node, specified by the value of its uri attribute (and optionally by a label) :param uri: A string to identify a Data Node by the value of its "uri" attribute :param label: (OPTIONAL) String to require the Data Node to have (redundant, since "uri" already uniquely specifies a Data Node - but could be used for speed or data integrity) :return: The internal database ID of the specified Data Node; if none (or more than one) found, an Exception is raised |
name | arguments | returns |
---|---|---|
get_data_node_id | key_value :str, key_name="uri" | int |
Get the internal database ID of a Data Node, given some other primary key :param key_value: The name of a primary key to use for the node lookup :param key_name: The value of the above primary key :return: The internal database ID of the specified Data Node |
name | arguments | returns |
---|---|---|
data_node_exists | node_id: Union[int, str], id_key=None, class_name=None | bool |
Return True if the specified Data Node exists, or False otherwise. :param node_id: Either an internal database ID or a primary key value :param id_key: [OPTIONAL] Name of a primary key used to identify the data node; for example, "uri". Leave blank to use the internal database ID :param class_name: [OPTIONAL] Used for a stricter check :return: True if the specified Data Node exists, or False otherwise |
name | arguments | returns |
---|---|---|
data_link_exists | node_1_id, node_2_id, rel_name :str, id_key=None | bool |
Return True if the specified Data Link exists, or False otherwise. :return: True if the specified Data Node link, or False otherwise |
name | arguments | returns |
---|---|---|
get_data_node | class_name :str, node_id, id_key=None | Union[dict, None] |
Locate a Data Node from its Class name, and a unique identifier :param class_name: The name of the Schema Class that this Data Node is associated to :param node_id: Either an internal database ID or a primary key value :param id_key: OPTIONAL - name of a primary key used to identify the data node; for example, "uri". Leave blank to use the internal database ID :return: |
name | arguments | returns |
---|---|---|
search_data_node | uri = None, internal_id = None, labels=None, properties=None | Union[dict, None] |
Return a dictionary with all the key/value pairs of the attributes of given data node See also locate_node() :param uri: The "uri" field to uniquely identify the data node :param internal_id: OPTIONAL alternate way to specify the data node; if present, it takes priority :param labels: OPTIONAL (generally redundant) ways to locate the data node :param properties: OPTIONAL (generally redundant) ways to locate the data node :return: A dictionary with all the key/value pairs, if node is found; or None if not |
name | arguments | returns |
---|---|---|
locate_node | node_id: Union[int, str], id_type=None, labels=None, dummy_node_name="n" | CypherMatch |
EXPERIMENTAL - a generalization of get_data_node() Return the "match" structure to later use to locate a node identified either by its internal database ID (default), or by a primary key (with optional label.) NOTE: No database operation is actually performed. :param node_id: This is understood be the Neo4j ID, unless an id_type is specified :param id_type: For example, "uri"; if not specified, the node ID is assumed to be Neo4j ID's :param labels: (OPTIONAL) Labels - a string or list/tuple of strings - for the node :param dummy_node_name: (OPTIONAL) A string with a name by which to refer to the node (by default, "n") :return: A "CypherMatch" object |
name | arguments | returns |
---|---|---|
get_all_data_nodes_of_class | class_name :str | list[dict] |
Return all the values stored all all the Data Nodes in the specified Class. The values comprise all node fields, the internal database ID and the node labels. EXAMPLE: [{'year': 2023, 'make': 'Ford', 'internal_id': 123, 'neo4j_labels': ['Motor Vehicle']}, {'year': 2013, 'make': 'Toyota', 'internal_id': 4, 'neo4j_labels': ['Motor Vehicle']} ] :param class_name: The name of a Class in the Schema :return: A list of dicts; each list item contains data from a node |
name | arguments | returns |
---|---|---|
class_of_data_node | node_id, id_key=None, labels=None | str |
Return the name of the Class of the given data node: identified either by its internal database ID (default), or by a primary key (such as "uri") with optional label) :param node_id: Either an internal database ID or a primary key value :param id_key: OPTIONAL - name of a primary key used to identify the data node; for example, "uri". Leave blank to use the internal database ID :param labels: Optional string, or list/tuple of strings, with internal database labels (DEPRECATED) :return: A string with the name of the Class of the given data node |
name | arguments | returns |
---|---|---|
data_nodes_of_class | class_name :str, return_option="uri" | Union[List[str], List[int]] |
Return the uri's, or alternatively the internal database ID's, of all the Data Nodes of the given Class :param class_name: Name of a Schema Class :param return_option: Either "uri" or "internal_id" :return: Return the uri's or internal database ID's of all the Data Nodes of the given Class |
name | arguments | returns |
---|---|---|
count_data_nodes_of_class | class_name: str | [int] |
Return the count of all the Data Nodes attached to the given Class. If the Class doesn't exist, an Exception is raised :param class_name: The name of the Schema Class of interest :return: The count of all the Data Nodes attached to the given Class |
name | arguments | returns |
---|---|---|
data_nodes_lacking_schema | label :str | [dict] |
Locate and return all nodes with the given label that aren't associated to any Schema Class :label: A string with a graph-database label :return: A list containing a single dictionary, with key 'n'; the value is a dict with all the properties of the located nodes |
name | arguments | returns |
---|---|---|
follow_links | class_name :str, node_id, link_name :str, id_key=None, properties=None, labels=None | List |
From the given starting node, follow all the relationships that have the specified name, from/into neighbor nodes (optionally having the given labels), and return some of the properties of those found nodes. :param class_name: String with the name of the Class of the given data node :param node_id: Either an internal database ID or a primary key value :param link_name: A string with the name of the link(s) to follow :param id_key: [OPTIONAL] Name of a primary key used to identify the data node; for example, "uri"; use None to refer to the internal database ID :param properties: [OPTIONAL] String, or list/tuple of strings, with the name(s) of the properties to return on the found nodes; if not specified, ALL properties are returned :param labels: [OPTIONAL] string, or list/tuple of strings, with node labels required to be present on the neighbor nodes TODO: not currently in use :return: A (possibly empty) list of values, if properties only contains a single element; otherwise, a list of dictionaries |
name | arguments | returns |
---|---|---|
create_data_node | class_node :Union[int, str], properties = None, extra_labels = None, new_uri=None, silently_drop=False | Union[int, None] |
Create a single new data node, of the type indicated by specified Class, with the given (possibly None) properties, and optional extra label(s); the name of the Class is always used as a label. If the requested Class doesn't exist, an Exception is raised. CAUTION: no check is made whether another data node with identical fields already exists; if that should be prevented, use add_data_node_merge() instead. The new data node, if successfully created, will optionally be assigned the passed URI value (new_uri) for its field `uri`. Note: the responsibility for picking a URI belongs to the calling function, which will typically make use of a namespace, and make use of reserve_next_uri() Alternatives: - If the data node needs to be created with links to other existing data nodes, use add_data_node_with_links() instead. - If creating multiple data nodes at once, consider using import_pandas_nodes() :param class_node: Either an integer with the internal database ID of an existing Class node, or a string with its name :param properties: (OPTIONAL) Dictionary with the properties of the new data node. EXAMPLE: {"make": "Toyota", "color": "white"} :param extra_labels:(OPTIONAL) String, or list/tuple of strings, with label(s) to assign to the new data node, IN ADDITION TO the Class name (which is always used as label) :param new_uri: (OPTIONAL) If new_uri is provided, then a field called "uri" is set to that value; also, an extra attribute named "schema_code" gets set (based on the Class to use for this Data Node); this extra attribute might eventually get obsoleted :param silently_drop: If True, any requested properties not allowed by the Schema are simply dropped; otherwise, an Exception is raised if any property isn't allowed Note: only applicable for "Strict" schema - with a "Lenient" schema anything goes :return: The internal database ID of the new data node just created, if created; or None if not created |
name | arguments | returns |
---|---|---|
_prepare_data_node_labels | class_name :str, extra_labels=None | [str] |
Return a list of labels to use on a Data Node, given its Schema Class (whose name is always used as one of the labels) and an optional list of extra labels. The given Class name must be valid, but the Class does not need to exist yet. Any leading/trailing blanks in the extra labels are removed. Duplicate names are ignored. :param class_name: The name of a Schema Class :param extra_labels: [OPTIONAL] Either a string, list/tuple of strings :return: |
name | arguments | returns |
---|---|---|
_create_data_node_helper | class_internal_id :int, labels=None, properties_to_set=None, uri_namespace=None, primary_key=None, duplicate_option=None | Union[int, None] |
Helper function, to (possibly) create a new data node, of the type indicated by specified Class, with the given label(s) and properties. IMPORTANT: all validations/schema checks are assumed to have been performed by the caller functions; this is a private method not meant for the end user! :param class_internal_id: The internal database ID of an existing Class node in the Schema :param labels: String, or list/tuple of strings, with label(s) to assign to the new Data node, (note: the Class name is expected to be among the labels) :param properties_to_set: [OPTIONAL] Dictionary with the properties of the new data node. EXAMPLE: {"make": "Toyota", "color": "white"} :param uri_namespace: [OPTIONAL] String with a namespace to use to auto-assign a uri value on the new data node; if not passed, no uri value gets set on the new node :param primary_key: [OPTIONAL] Name of a field that is to be regarded as a primary key :param duplicate_option: Only applicable if primary_key is specified; if provided, must be "merge" or "replace" :return: If a new Data node gets created, return its internal database ID; otherwise (in case of a duplicate node already present) return None |
name | arguments | returns |
---|---|---|
add_data_node_merge | class_name :str, properties :dict | (int, bool) |
A new Data Node gets created ONLY IF there's no other Data Node containing the same specified properties (and possibly unspecified others), and attached to the given Class. An Exception is raised if any of the requested properties is not registered with the given Schema Class, or if that Class doesn't accept Data Nodes. :param class_name: The Class node for the Data Node to locate, or create if not found :param properties: A dictionary with the properties to look up the Data Node by, or to give to a new one if an existing one wasn't found. EXAMPLE: {"make": "Toyota", "color": "white"} :return: A pair with: 1) The internal database ID of either an existing Data Node or of a new one just created 2) True if a new Data Node was created, or False if not (i.e. an existing one was found) |
name | arguments | returns |
---|---|---|
add_data_column_merge | class_name :str, property_name: str, value_list: list | dict |
Add a data column (i.e. a set of single-property data nodes). Individual nodes are created only if there's no other data node with the same property/value :param class_name: The Class node for the Data Node to locate, or create if not found :param property_name: The name of the data column (i.e. the name of the data field) :param value_list: A list of values that make up the the data column :return: A dictionary with 2 keys - "new_nodes" and "old_nodes"; their values are the respective numbers of nodes (created vs. found) |
name | arguments | returns |
---|---|---|
add_data_node_with_links | class_name = None, class_internal_id = None, properties = None, labels = None, links = None, assign_uri=False, new_uri=None | int |
# TODO: eventually absorb into create_data_node() This is NeoSchema's counterpart of NeoAccess.create_node_with_links() Add a new data node, of the Class specified by its name, with the given (possibly none) attributes and label(s), optionally linked to other, already existing, DATA nodes. If the specified Class doesn't exist, or doesn't allow for Data Nodes, an Exception is raised. The new data node, if successfully created: 1) will be given the Class name as a label, unless labels are specified 2) will optionally be assigned an "uri" unique value that is either automatically assigned or passed. EXAMPLES: add_data_node_with_links(class_name="Cars", properties={"make": "Toyota", "color": "white"}, links=[{"internal_id": 123, "rel_name": "OWNED_BY", "rel_dir": "IN"}]) TODO: verify the all the passed attributes are indeed properties of the class (if the schema is Strict) TODO: verify that required attributes are present TODO: verify that all the requested links conform to the Schema TODO: invoke special plugin-code, if applicable??? TODO: maybe rename to add_data_node() :param class_name: The name of the Class that this new data node is an instance of. Also use to set a label on the new node, if labels isn't specified :param class_internal_id: OPTIONAL alternative to class_name. If both specified, class_internal_id prevails TODO: merge class_name and class_internal_id into class_node, as done for create_data_node() :param properties: An optional dictionary with the properties of the new data node. EXAMPLE: {"make": "Toyota", "color": "white"} :param labels: OPTIONAL string, or list of strings, with label(s) to assign to the new data node; if not specified, use the Class name. TODO: ALWAYS include the Class name, as done in create_data_node() :param links: OPTIONAL list of dicts identifying existing nodes, and specifying the name, direction and optional properties to give to the links connecting to them; use None, or an empty list, to indicate if there aren't any Each dict contains the following keys: "internal_id" REQUIRED - to identify an existing node "rel_name" REQUIRED - the name to give to the link "rel_dir" OPTIONAL (default "OUT") - either "IN" or "OUT" from the new node "rel_attrs" OPTIONAL - A dictionary of relationship attributes :param assign_uri: If True, the new node is given an extra attribute named "uri", with a unique auto-increment value. Default is False OBSOLETED :param new_uri: Normally, the Item ID is auto-generated, but it can also be provided (Note: MUST be unique) If new_uri is provided, then assign_uri is automatically made True :return: If successful, an integer with the internal database ID of the node just created; otherwise, an Exception is raised |
name | arguments | returns |
---|---|---|
update_data_node | data_node :Union[int, str], set_dict :dict, drop_blanks = True, class_name=None | int |
Update, possibly adding and/or dropping fields, the properties of an existing Data Node :param data_node: Either an integer with the internal database ID, or a string with a URI value :param set_dict: A dictionary of field name/values to create/update the node's attributes (note: blanks ARE allowed within the keys) Blanks at the start/end of string values are zapped :param drop_blanks: If True, then any blank field is interpreted as a request to drop that property (as opposed to setting its value to "") :param class_name: [OPTIONAL] The name of the Class to which the given Data Note is part of; if provided, it gets enforced :return: The number of properties set or removed; if the record wasn't found, or an empty set_dict was passed, return 0 Important: a property is counted as "set" even if the new value is identical to the old value! |
name | arguments | returns |
---|---|---|
delete_data_nodes | class_name :str | int |
Delete all the Data Nodes of the given Schema Class :param class_name: The name of a Schema Class :return: The number of deleted Data Nodes |
name | arguments | returns |
---|---|---|
delete_data_point | uri: str, labels=None | int |
Delete the given data point. TODO: obsolete in favor of delete_data_nodes() :param uri: :param labels: OPTIONAL (generally, redundant) :return: The number of nodes deleted (possibly zero) |
name | arguments | returns |
---|---|---|
register_existing_data_node | class_name="", schema_uri=None, existing_neo_id=None, new_uri=None | int |
Register (declare to the Schema) an existing data node with the Schema Class specified by its name or ID. An uri is generated for the data node and stored on it. Return the newly-assigned uri EXAMPLES: register_existing_data_node(class_name="Chemicals", existing_neo_id=123) register_existing_data_node(schema_uri="schema-19", existing_neo_id=456) TODO: verify the all the passed attributes are indeed properties of the class (if the schema is Strict) TODO: verify that required attributes are present TODO: invoke special plugin-code, if applicable :param class_name: The name of the Class that this new data node is an instance of :param schema_uri: Alternate way to specify the Class; if both present, class_name prevails :param existing_neo_id: Internal ID to identify the node to register with the above Class. TODO: expand to use the match() structure :param new_uri: OPTIONAL. Normally, the Item ID is auto-generated, but it can also be provided (Note: MUST be unique) :return: If successful, an integer with the auto-increment "uri" value of the node just created; otherwise, an Exception is raised |
name | arguments | returns |
---|---|---|
add_data_relationship_hub | center_id :int, periphery_ids :[int], periphery_class :str, rel_name :str, rel_dir = "OUT" | int |
Add a group of relationships between a single Data Node ("center") and each of the Data Nodes in the given list ("periphery"), with the specified relationship name and direction. All Data Nodes must already exist. All the "periphery" Data Nodes must belong to the same Class (whose name is passed by periphery_class) :param center_id: Internal database ID of an existing Data Node that we wish to connect to all other Data Nodes specified in the next argument :param periphery_ids: List of internal database IDs of existing Data Nodes, all belonging to the Class passed by the next argument :param periphery_class: The name of the common Class to which all the Data Nodes specified in periphery_ids belong to :param rel_name: A string with the name to give to all the newly-created relationships :param rel_dir: Either "IN" (towards the "center" node) or "OUT" (away from it, towards the "periphery" nodes) :return: The number of relationships created |
name | arguments | returns |
---|---|---|
add_data_relationship | from_id, to_id, rel_name :str, rel_props = None, id_type=None | None |
Add a new relationship with the given name, from one to the other of the 2 given data nodes, identified by their Neo4j ID's. The requested new relationship MUST be present in the Schema, or an Exception will be raised. Note that if a relationship with the same name already exists between the data nodes exists, nothing gets created (and an Exception is raised) :param from_id: Either an internal database ID or a primary key value of the data node at which the new relationship is to originate :param to_id: Either an internal database ID or a primary key value of the data node at which the new relationship is to end :param rel_name:The name to give to the new relationship between the 2 specified data nodes IMPORTANT: it MUST be allowed by the Schema :param rel_props:TODO: not currently used. Unclear what multiple calls would do in this case :param id_type: OPTIONAL - name of a primary key used to identify the data nodes; for example, "uri". Leave blank to use the internal database ID's instead :return: None. If the specified relationship didn't get created (for example, in case the the new relationship doesn't exist in the Schema), raise an Exception |
name | arguments | returns |
---|---|---|
remove_data_relationship | from_id :str, to_id :str, rel_name :str, id_type="uri", labels=None | None |
Drop the relationship with the given name, from one to the other of the 2 given DATA nodes. Note: the data nodes are left untouched. If the specified relationship didn't get deleted, raise an Exception :param from_id: String with the "uri" value of the data node at which the relationship originates :param to_id: String with the "uri" value of the data node at which the relationship ends :param rel_name: The name of the relationship to delete :param id_type: For now, only "uri" (default) is implemented :param labels: OPTIONAL (generally, redundant). Labels required to be on both nodes :return: None. If the specified relationship didn't get deleted, raise an Exception |
name | arguments | returns |
---|---|---|
remove_multiple_data_relationships | node_id: Union[int, str], rel_name: str, rel_dir: str, labels=None | None |
Drop all the relationships with the given name, from or to the given data node. Note: the data node is left untouched. IMPORTANT: this function cannot be used to remove relationship involving any Schema node :param node_id: The internal database ID (integer) or name (string) of the data node of interest :param rel_name: The name of the relationship(s) to delete :param rel_dir: Either 'IN', 'OUT', or 'BOTH' :param labels: [OPTIONAL] :return: None |
name | arguments | returns |
---|---|---|
import_pandas_nodes_NO_BATCH | df :pd.DataFrame, class_name: str, class_node=None, select=None, drop=None, rename=None, primary_key=None, duplicate_option="merge", datetime_cols=None, int_cols=None, extra_labels=None, uri_namespace=None, report_frequency=100 | [int] |
OLD VERSION of the much-faster import_pandas_nodes(), largely obsoleted by it! Import a group of entities (records), from the rows of a Pandas dataframe, as Data Nodes in the database. Dataframe cells with NaN's and empty strings are dropped - and never make it into the database. Note: if you have a CSV file whose first row contains the field names, you can first do imports such as df = pd.read_csv("C:/Users/me/some_name.csv", encoding = "ISO-8859-1") :param df: A Pandas Data Frame with the data to import; each row represents a record - to be turned into a graph-database node. Each column represents a Property of the data node, and it must have been previously declared in the Schema :param class_name: The name of a Class node already present in the Schema :param class_node: OBSOLETED :param select: [OPTIONAL] Name of the field, or list of names, to import; all others will be ignored (Note: original name prior to any rename, if applicable) :param drop: [OPTIONAL] Name of a field, or list of names, to ignore during import (Note: original name prior to any rename, if applicable) If both arguments "select" and "drop" are passed, an Exception gets raised :param rename: [OPTIONAL] dictionary to rename the Pandas dataframe's columns to EXAMPLE {"current_name": "name_we_want"} :param primary_key: [OPTIONAL] Name of a field that is to be regarded as a primary key; any import of a record that is a duplicate in that field, will result in the modification of the existing record, rather than the creation of new one; the details of the modification are based on the argument `duplicate_option' :param duplicate_option: Only applicable if primary_key is specified; if provided, must be "merge" (default) or "replace". Any field present in both the original (old) and the new (being imported) record will get over-written with the new value; any field present in the original record but not the new one will EITHER be left standing ("merge" option) or ditched ("replace" option) EXAMPLE: if the database contains the record {'vehicle ID': 'c2', 'make': 'Toyota', 'year': 2013} then the import of {'vehicle ID': 'c2', 'make': 'BMW', 'color': 'white'} with a primary_key of 'vehicle ID', will result in NO new record addition; the existing record will transform into either (if duplicate_option is "merge"): {'vehicle ID': 'c2', 'make': 'BMW', 'color': 'white', 'year':2013} (if duplicate_option is "replace"): {'vehicle ID': 'c2', 'make': 'BMW', 'color': 'white'} Notice that the only difference between the 2 option is fields present in the original record but not in the imported one. :param datetime_cols:[OPTIONAL] String, or list/tuple of strings, of column name(s) that contain datetime strings such as '2015-08-15 01:02:03' (compatible with the python "datetime" format) :param int_cols: [OPTIONAL] String, or list/tuple of strings, of column name(s) that contain integers, or that are to be converted to integers (typically necessary because numeric Pandas columns with NaN's are automatically turned into floats; this argument will cast them to int's, and drop the NaN's) :param extra_labels:[OPTIONAL] String, or list/tuple of strings, with label(s) to assign to the new Data nodes, IN ADDITION TO the Class name (which is always used as label) :param uri_namespace:[OPTIONAL] String with a namespace to use to auto-assign uri values on the new Data nodes; if that namespace hasn't previously been created with create_namespace() or with reserve_next_uri(), a new one will be created with no prefix nor suffix (i.e. all uri's be numeric strings.) If not passed, no uri values will get set on the new nodes :param report_frequency: [OPTIONAL] How often to print the status of the import-in-progress (default 100) :return: A list of the internal database ID's of the newly-created Data nodes |
name | arguments | returns |
---|---|---|
import_pandas_nodes | df :pd.DataFrame, class_name: str, select=None, drop=None, rename=None, primary_key=None, duplicate_option="merge", datetime_cols=None, int_cols=None, extra_labels=None, report=True, report_frequency=1, max_batch_size=1000 | dict |
Import a group of entities (records), from the rows of a Pandas dataframe, as Data Nodes in the database. Dataframe cells with NaN's and empty strings are dropped - and never make it into the database. Note: if you have a CSV file whose first row contains the field names, you can first do imports such as df = pd.read_csv("C:/Users/me/some_name.csv", encoding = "ISO-8859-1") :param df: A Pandas Data Frame with the data to import; each row represents a record - to be turned into a graph-database node. Each column represents a Property of the data node, and it must have been previously declared in the Schema :param class_name: The name of a Class node already present in the Schema :param select: [OPTIONAL] Name of the Pandas field, or list of names, to import; all others will be ignored (Note: original name prior to any rename, if applicable) :param drop: [OPTIONAL] Name of a Pandas field, or list of names, to ignore during import (Note: original name prior to any rename, if applicable) If both arguments "select" and "drop" are passed, an Exception gets raised :param rename: [OPTIONAL] dictionary to rename the Pandas dataframe's column names to EXAMPLE {"current_name": "name_we_want"} :param primary_key: [OPTIONAL] Name of a Pandas field that is to be regarded as a primary key; any import of a record that is a duplicate in that field, will result in the modification of the existing record, rather than the creation of new one; the details of the modification are based on the argument `duplicate_option' (Note: original name prior to any rename, if applicable) :param duplicate_option: Only applicable if primary_key is specified; if provided, must be "merge" (default) or "replace". Any field present in both the original (old) and the new (being imported) record will get over-written with the new value; any field present in the original record but not the new one will EITHER be left standing ("merge" option) or ditched ("replace" option) EXAMPLE: if the database contains the record {'vehicle ID': 'c2', 'make': 'Toyota', 'year': 2013} then the import of {'vehicle ID': 'c2', 'make': 'BMW', 'color': 'white'} with a primary_key of 'vehicle ID', will result in NO new record addition; the existing record will transform into either (if duplicate_option is "merge"): {'vehicle ID': 'c2', 'make': 'BMW', 'color': 'white', 'year':2013} (if duplicate_option is "replace"): {'vehicle ID': 'c2', 'make': 'BMW', 'color': 'white'} Notice that the only difference between the 2 option is fields present in the original record but not in the imported one. :param datetime_cols: [OPTIONAL] String, or list/tuple of strings, of column name(s) that contain datetime strings such as '2015-08-15 01:02:03' (compatible with the python "datetime" format) :param int_cols: [OPTIONAL] String, or list/tuple of strings, of column name(s) that contain integers, or that are to be converted to integers (typically necessary because numeric Pandas columns with NaN's are automatically turned into floats; this argument will cast them to int's, and drop the NaN's) :param extra_labels: [OPTIONAL] String, or list/tuple of strings, with label(s) to assign to the new Data nodes, IN ADDITION TO the Class name (which is always used as label) :param report: [OPTIONAL] If True (default), print the status of the import-in-progress at the end of each batch round :param report_frequency: [OPTIONAL] Only applicable if report is True :param max_batch_size: To limit the number of Pandas rows loaded into the database at one time :return: A dict with 2 keys: 'number_nodes_created': the number of newly-created nodes 'affected_nodes_ids' list of the internal database ID's nodes that were created or updated, in the import order ("updated" doesn't necessarily mean changed). Note that ID's might occur more than once when the "primary_key" arg is specified, because imports might then refer to existing, or previously-created. nodes. |
name | arguments | returns |
---|---|---|
import_pandas_links | df :pd.DataFrame, class_from :str, class_to :str, col_from :str, col_to :str, link_name :str, col_link_props=None, name_map=None, skip_errors = False, report_frequency=100 | [int] |
Import a group of relationships between existing database Data Nodes, from the rows of a Pandas dataframe, as database links between the existing Data Nodes. :param df: A Pandas Data Frame with the data RELATIONSHIP to import :param class_from: Name of the Class of the data nodes that the relationship originates from :param class_to: Name of the Class of the data nodes that the relationship ends into :param col_from: Name of the Data Frame column identifying the data nodes from which the relationship starts (the values are expected to be foreign keys) :param col_to: Name of the Data Frame column identifying the data nodes to which the relationship ends (the values are expected to be foreign keys) :param link_name: Name of the new relationship being created :param col_link_props: [OPTIONAL] Name of a property to assign to the relationships, as well as name of the Data Frame column containing the values. Any NaN values are ignored (no property set on that relationship.) :param name_map: [OPTIONAL] Dict with mapping from Pandas column names to Property names in the data nodes in the database :param skip_errors: [OPTIONAL] If True, the import continues even in the presence of errors; default is False :param report_frequency: [OPTIONAL] How often to print out the status of the import-in-progress (in terms of number of imported links) :return: A list of of the internal database ID's of the created links |
name | arguments | returns |
---|---|---|
scrub_dict | d :dict | dict |
Helper function to clean up data during imports. Given a dictionary, assemble and return a new dict where string values are trimmed of any leading or trailing blanks. Entries whose values are blank or NaN get omitted from the new dictionary being returned. EXAMPLE: {"a": 1, "b": 3.5, "c": float("nan"), "d": "some value", "e": " needs cleaning! ", "f": "", "g": " "} gets simplified to: {"a": 1, "b": 3.5, "d": "some value", "e": "needs cleaning!" } :param d: A python dictionary with data to "clean up" :return: A python dictionary with the cleaned-up data |
name | arguments | returns |
---|---|---|
import_triplestore | df :pd.DataFrame, class_node :Union[int, str], col_names = None, uri_prefix = None, datetime_cols=None, int_cols=None, extra_labels=None, report_frequency=100 | [int] |
Import "triplestore" data from a Pandas dataframe that contains 3 columns called: subject , predicate , object The values of the "subject" column are used for identifying entities, and then turned into URI's. The values of the "predicate" column are taken to be the names of the Properties (possibly mapped by means of the dictionary "col_names" The values of the "object" column are taken to be the values (literals) of the Properties Note: "subject" and "predicate" is typically an integer or a string EXAMPLE - Panda's data frame: subject predicate object 0 57 1 Advanced Graph Databases 1 57 2 New York University 2 57 3 Fall 2024 col_names = {1: "Course Title", 2: "School", 3: "Semester"} uri_prefix = "r-" The above will result in the import of a node with the following properties: {"uri": "r-57", "Course Title": "Advanced Graph Databases", "School": "New York University", "Semester": "Fall 2024"} :param df: A Pandas dataframe that contains 3 columns called: subject , predicate , object :param class_node: Either an integer with the internal database ID of an existing Class node, or a string with its name :param col_names: [OPTIONAL] Dict with mapping from values in the "predicate" column of the data frame and the names of the new nodes' Properties :param uri_prefix: [OPTIONAL] String to prefix to the values in the "subjec" column :param datetime_cols: [SEE import_pandas_nodes()] :param int_cols: [SEE import_pandas_nodes()] :param extra_labels: [SEE import_pandas_nodes()] :param report_frequency:[SEE import_pandas_nodes()] :return: A list of the internal database ID's of the newly-created Data nodes |
name | arguments | returns |
---|---|---|
import_json_data | json_str: str, class_name: str, parse_only=False, provenance=None | Union[None, int, List[int]] |
Import the data specified by a JSON string into the database - but only the data that is described in the existing Schema; anything else is silently ignored. CAUTION: A "postorder" approach is followed: create subtrees first (with recursive calls), then create the root last; as a consequence, in case of failure mid-import, there's no top root, and there could be several fragments. A partial import might need to be manually deleted. TODO: maintain a list of all created nodes - so as to be able to delete them all in case of failure. :param json_str: A JSON string representing (at the top level) an object or a list to import :param class_name: Name of Schema class to use for the top-level element(s) :param parse_only: Flag indicating whether to stop after the parsing (i.e. no database import) :param provenance: Metadata (such as a file name) to store in the "source" attribute of a special extra node ("Import Data") :return: |
name | arguments | returns |
---|---|---|
create_data_nodes_from_python_data | data, class_name: str, provenance=None | [int] |
Import the data specified by the "data" python structure into the database - but only the data that is described in the existing Schema; anything else is silently ignored. For additional notes, see import_json_data() :param data: A python dictionary or list, with the data to import :param class_name: The name of the Schema Class for the root node(s) of the imported data :param provenance: Optional string to be stored in a "source" attribute in a special "Import Data" node for metadata about the import :return: List (possibly empty) of internal database ID's of the root node(s) created TODO: * The "Import Data" Class must already be in the Schema; should automatically add it, if not already present * DIRECTION OF RELATIONSHIP (cannot be specified by Python dict/JSON) * LACK OF "Import Data" node (ought to be automatically created if needed) * LACK OF "BA" (or "DATA"?) labels being set * INABILITY TO LINK TO EXISTING NODES IN DBASE (try using: "uri": some_int as the only property in nodes to merge) * OFFER AN OPTION TO IGNORE BLANK STRINGS IN ATTRIBUTES * INTERCEPT AND BLOCK IMPORTS FROM FILES ALREADY IMPORTED * issue some report about any part of the data that doesn't match the Schema, and got silently dropped |
name | arguments | returns |
---|---|---|
create_tree_from_dict | d: dict, class_name: str, level=1, cache=None | Union[int, None] |
Add a new data node (which may turn into a tree root) of the specified Class, with data from the given dictionary: 1) literal values in the dictionary are stored as attributes of the node, using the keys as names 2) other values (such as dictionaries or lists) are recursively turned into subtrees, linked from the new data node through outbound relationships using the dictionary keys as names Return the Neo4j ID of the newly created root node, or None is nothing is created (this typically arises in recursive calls that "skip subtrees") IMPORTANT: any part of the data that doesn't match the Schema, gets silently dropped. TODO: issue some report about anything that gets dropped EXAMPLES: (1) {"state": "California", "city": "Berkeley"} results in the creation of a new node, with 2 attributes, named "state" and "city" (2) {"name": "Julian", "address": {"state": "California", "city": "Berkeley"}} results in the creation of 2 nodes, namely the tree root (with a single attribute "name"), with an outbound link named "address" to another node (the subtree) that has the "state" and "city" attributes (3) {"headquarter_state": [{"state": "CA"}, {"state": "NY"}, {"state": "FL"}]} results in the creation of a node (the tree root), with no attributes, and 3 links named "headquarter_state" to, respectively, 3 nodes - each of which containing a "state" attribute (4) {"headquarter_state": ["CA", "NY", "FL"]} similar to (3), above, but the children nodes will use the default attribute name "value" :param d: A dictionary with data from which to create a tree in the database :param class_name: The name of the Schema Class for the root node(s) of the imported data :param level: The level of the recursive call (used for debug printing) :return: The Neo4j ID of the newly created node, or None is nothing is created (this typically arises in recursive calls that "skip subtrees") |
name | arguments | returns |
---|---|---|
create_trees_from_list | l: list, class_name: str, level=1, cache=None | [int] |
Add a set of new data nodes (the roots of the trees), all of the specified Class, with data from the given list. Each list elements MUST be a literal, or dictionary or a list: - if a literal, it first gets turned into a dictionary of the form {"value": literal_element}; - if a dictionary, it gets processed by create_tree_from_dict() - if a list, it generates a recursive call Return a list of the Neo4j ID of the newly created nodes. IMPORTANT: any part of the data that doesn't match the Schema, gets silently dropped. TODO: issue some report about that EXAMPLE: If the Class is named "address" and has 2 properties, "state" and "city", then the data: [{"state": "California", "city": "Berkeley"}, {"state": "Texas", "city": "Dallas"}] will give rise to 2 new data nodes with label "address", and each of them having a "SCHEMA" link to the shared Class node. :param l: A list of data from which to create a set of trees in the database :param class_name: The name of the Schema Class for the root node(s) of the imported data :param level: The level of the recursive call (used for debug printing) :return: A list of the Neo4j values of the newly created nodes (each of which might be a root of a tree) |
name | arguments | returns |
---|---|---|
create_schema_from_sample_data | match | |
Create a Schema from sample data node, for example as created with the Arrow app TODO: NOT YET COMPLETED. NOT FOR PRODUCTION :param match: # Maybe allow a label, or range of ID's, instead :return: |
name | arguments | returns |
---|---|---|
export_schema | cls | {} |
Export all the Schema nodes and relationships as a JSON string. IMPORTANT: APOC must be activated in the database, to use this function. Otherwise it'll raise an Exception :return: A dictionary specifying the number of nodes exported, the number of relationships, and the number of properties, as well as a "data" field with the actual export as a JSON string |
name | arguments | returns |
---|---|---|
is_valid_uri | uri :str | bool |
Check the validity of the passed uri. If the uri belongs to a Schema node, a tighter check can be performed with is_valid_schema_uri() :param uri: A string with a value that is expected to be a uri of a node :return: True if the passed uri has a valid value, or False otherwise |
name | arguments | returns |
---|---|---|
is_valid_schema_uri | schema_uri :str | bool |
Check the validity of the passed Schema uri. It should be of the form "schema-n" for some integer n To check the validity of the uri of a Data node rather than a Schema node, use is_valid_uri() instead :param schema_uri: A string with a value that is expected to be a uri of a Schema node :return: True if the passed uri has a valid value, or False otherwise |
name | arguments | returns |
---|---|---|
assign_uri | internal_id :int, namespace="data_node" | str |
Given an existing Data Node that lacks a URI value, assign one to it (and save it in the database.) If a URI value already exists on the node, an Exception is raised :param internal_id: Internal database ID to identify a Data Node tha currently lack a URI value :param namespace: A string used to maintain completely separate groups of auto-increment values; leading/trailing blanks are ignored :return: A string with the newly-assigned URI value |
name | arguments | returns |
---|---|---|
create_namespace | name :str, prefix="", suffix="" | None |
Set up a new namespace for URI's. :param name: A string used to maintain completely separate groups of auto-increment values; leading/trailing blanks are ignored :param prefix: (OPTIONAL) String to prefix to the auto-increment number; it will be stored in the database :param suffix: (OPTIONAL) String to suffix to the auto-increment number; it will be stored in the database :return: None |
name | arguments | returns |
---|---|---|
namespace_exists | name :str | bool |
Return True if the specified namespace already exists, or False otherwise :param name: :return: |
name | arguments | returns |
---|---|---|
reserve_next_uri | namespace="data_node", prefix="", suffix="" | str |
Generate and reserve a URI (or fragment thereof, aka "token"), using the given namespace and, optionally the given prefix and/or suffix. The middle part of the generated URI is a unique auto-increment value (separately maintained for various groups, or "namespaces"). If the requested namespace is not the default one, make sure to first create it with create_namespace() If no prefix or suffix is specified, use the values provided when the namespace was first created. EXAMPLES: reserve_next_uri("Document", "doc.", ".new") might produce "doc.3.new" reserve_next_uri("Image", prefix="i-") might produce "i-123" IMPORTANT: Prefixes and suffixes only need to be passed when first creating a new namespace; if they're passed in here, they over-ride their stored counterparts. Note that the returned uri is de-facto "permanently reserved" on behalf of the calling function, and can't be used by any other competing thread, thus avoid concurrency problems (racing conditions) :param namespace: A string used to maintain completely separate groups of auto-increment values; leading/trailing blanks are ignored. It must exist, unless the default value is accepted (in which case, it gets created as needed) :param prefix: (OPTIONAL) String to prefix to the auto-increment number. If it's the 1st call for the given namespace, store it in the database; otherwise, if a value is passed, use it to over-ride the stored one :param suffix: (OPTIONAL) String to suffix to the auto-increment number If it's the 1st call for the given namespace, store it in the database; otherwise, if a value is passed, use it to over-ride the stored one :return: A string (with the prefix and suffix from above) that contains an integer that is a unique auto-increment for the specified namespace (starting with 1); it's ready-to-use and "reserved", i.e. could be used at any future time |
name | arguments | returns |
---|---|---|
advance_autoincrement | namespace :str, advance=1 | (int, str, str) |
Utilize an ATOMIC database operation to both read AND advance the autoincrement counter, based on a (single) node that: 1) contains the label `Schema Autoincrement` 2) and also contains, as an attribute, the desired namespace (group); if no such node exists (for example, after a new installation), an Exception is raised. An ATOMIC database operation is utilized to both read AND advance the autoincrement counter, based on a (single) node with label `Schema Autoincrement` as well as an attribute indicating the desired namespace (group) Note that the returned number (or the last of an implied sequence of numbers, if advance > 1) is de-facto "permanently reserved" on behalf of the calling function, and can't be used by any other competing thread, thus avoid concurrency problems (racing conditions) :param namespace: A string used to maintain completely separate groups of auto-increment values; leading/trailing blanks are ignored :param advance: Normally, auto-increment advances by 1 unit, but a different positive integer may be used to "reserve" a group of numbers in the above namespace :return: An integer that is a unique auto-increment for the specified namespace (starting with 1); it's ready-to-use and "reserved", i.e. could be used at any future time. If advance > 1, the first of the reserved numbers is returned |
name | arguments | returns |
---|---|---|
_next_available_schema_uri | cls | str |
Return the next available uri for nodes managed by this class. For unique uri's to use on Data Nodes, use reserve_next_uri() instead :return: A string based on unique auto-increment values, used for Schema nodes |
name | arguments | returns |
---|---|---|
assign_namespace_to_class | class_name :str, namespace :str | None |
Link up a Class node to the node of a namespace to be used for data nodes of that Class :param class_name: :param namespace: :return: None |
name | arguments | returns |
---|---|---|
lookup_class_namespace | class_name :str | Union[str, None] |
Look up the namespace, if any, assigned to the given Class, by means of a standard "HAS_URI_GENERATOR" relationship. If not found, return None :param class_name: Name of a Schema Class :return: |
name | arguments | returns |
---|---|---|
generate_uri | class_name :str | str |
Use, as appropriate for the given Class, a specific namespace - or the general data node namespace - to generate a URI to use on a newly-create Data Node :param class_name: Name of a Schema Class :return: |
name | arguments | returns |
---|---|---|
debug_print | info: str, trim=False | None |
If the class' property "debug" is set to True, print out the passed info string, optionally trimming it, if too long :param info: :param trim: (OPTIONAL) Flag indicating whether to only print a shortened version :return: None |